CyberAgentAILab/flex-dm

[CVPR 2023 highlight] Towards Flexible Multi-modal Document Models

31
/ 100
Emerging

This project helps developers build document understanding systems that can process complex, multi-modal documents like advertisements, posters, or mobile app interfaces. It takes these visual documents as input and can learn to extract structured information or perform various analyses based on both their text and visual layout. Developers working on AI systems for document analysis would use this to create models tailored to their specific data.

No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher looking to train custom models for understanding the content and layout of visually rich documents.

Not ideal if you need an out-of-the-box solution for document processing without custom model training or deep technical expertise.

document-intelligence computer-vision natural-language-processing layout-analysis multi-modal-ai
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 7 / 25

How are scores calculated?

Stars

59

Forks

3

Language

Python

License

Apache-2.0

Last pushed

Sep 07, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/CyberAgentAILab/flex-dm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.