DataFog/vlm-api

REST API for computing cross-modal similarity between images and text using the ColPaLI vision-language model

/ 100

Experimental

This API helps document analysts, researchers, and knowledge managers improve information retrieval from visually rich documents like reports, infographics, or manuals. You input a document (image or PDF) and a text query, and it outputs a highlighted document showing where your query relates visually, along with a detailed similarity score. It helps you understand which parts of a document's images or layouts are most relevant to your search.

No commits in the last 6 months.

Use this if you need to find specific information within documents that heavily rely on visual elements like charts, diagrams, or complex layouts, and traditional text-based search tools often miss the context.

Not ideal if your documents are primarily text-based with minimal visual content, as standard text retrieval methods might be more efficient.

document-analysis information-retrieval research-analytics visual-search knowledge-management

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

illuin-tech/colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

AnswerDotAI/byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

jolibrain/colette

Multimodal RAG to search and interact locally with technical documents of any kind

nannib/nbmultirag

Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG,...

OpenBMB/VisRAG

Parsing-free RAG supported by VLMs

Explore RAG Tools

All categories Trending RAG directory Insights