FuxiaoLiu/DocumentCLIP
[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
This project helps you understand how figures and main body text are related within complex documents like news articles, magazines, or product descriptions. It takes a document containing text and multiple images as input and helps identify the specific text segments that describe or refer to each image. This is useful for anyone working with rich, visual documents where understanding the link between text and images is crucial.
No commits in the last 6 months.
Use this if you need to automatically identify which parts of a document's main text are associated with its various figures, beyond just a simple caption.
Not ideal if you are only interested in single image-text pairs or if your documents have very simple, clearly defined figure captions without broader textual references.
Stars
16
Forks
—
Language
Python
License
—
Category
Last pushed
Apr 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FuxiaoLiu/DocumentCLIP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...
clip-italian/clip-italian
CLIP (Contrastive LanguageāImage Pre-training) for Italian