CyberAgentAILab/flex-dm

[CVPR 2023 highlight] Towards Flexible Multi-modal Document Models

/ 100

Emerging

This project helps developers build document understanding systems that can process complex, multi-modal documents like advertisements, posters, or mobile app interfaces. It takes these visual documents as input and can learn to extract structured information or perform various analyses based on both their text and visual layout. Developers working on AI systems for document analysis would use this to create models tailored to their specific data.

No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher looking to train custom models for understanding the content and layout of visually rich documents.

Not ideal if you need an out-of-the-box solution for document processing without custom model training or deep technical expertise.

document-intelligence computer-vision natural-language-processing layout-analysis multi-modal-ai

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

PRIS-CV/DemoFusion

Let us democratise high-resolution generation! (CVPR 2024)

mit-han-lab/distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Tencent-Hunyuan/HunyuanPortrait

[CVPR-2025] The official code of HunyuanPortrait: Implicit Condition Control for Enhanced...

giuvecchio/matfuse

MatFuse: Controllable Material Generation with Diffusion Models (CVPR2024)

Shilin-LU/TF-ICON

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official...

Explore Diffusion Models

All categories Trending Diffusion directory Insights