FuxiaoLiu/DocumentCLIP

[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

/ 100

Experimental

This project helps you understand how figures and main body text are related within complex documents like news articles, magazines, or product descriptions. It takes a document containing text and multiple images as input and helps identify the specific text segments that describe or refer to each image. This is useful for anyone working with rich, visual documents where understanding the link between text and images is crucial.

No commits in the last 6 months.

Use this if you need to automatically identify which parts of a document's main text are associated with its various figures, beyond just a simple caption.

Not ideal if you are only interested in single image-text pairs or if your documents have very simple, clearly defined figure captions without broader textual references.

content-analysis document-understanding multimedia-publishing information-extraction digital-archives

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Kaushalya/medclip

A multi-modal CLIP model trained on the medical dataset ROCO

kastalimohammed1965/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...

clip-italian/clip-italian

CLIP (Contrastive Language–Image Pre-training) for Italian

Explore Transformer Models

All categories Trending Transformer directory Insights