CyberAgentAILab/flex-dm
[CVPR 2023 highlight] Towards Flexible Multi-modal Document Models
This project helps developers build document understanding systems that can process complex, multi-modal documents like advertisements, posters, or mobile app interfaces. It takes these visual documents as input and can learn to extract structured information or perform various analyses based on both their text and visual layout. Developers working on AI systems for document analysis would use this to create models tailored to their specific data.
No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher looking to train custom models for understanding the content and layout of visually rich documents.
Not ideal if you need an out-of-the-box solution for document processing without custom model training or deep technical expertise.
Stars
59
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Sep 07, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/CyberAgentAILab/flex-dm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PRIS-CV/DemoFusion
Let us democratise high-resolution generation! (CVPR 2024)
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Tencent-Hunyuan/HunyuanPortrait
[CVPR-2025] The official code of HunyuanPortrait: Implicit Condition Control for Enhanced...
giuvecchio/matfuse
MatFuse: Controllable Material Generation with Diffusion Models (CVPR2024)
Shilin-LU/TF-ICON
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official...