aimagelab/ReT

[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

/ 100

Experimental

This project helps you find specific documents from a large collection by understanding both text and images. You provide a question or description, possibly with an image, and it returns relevant documents that match your query, even if the information is spread across text and visuals. This is ideal for researchers, analysts, or anyone who needs to accurately retrieve information from complex, multimodal datasets.

No commits in the last 6 months.

Use this if you need to perform highly accurate searches on documents that contain both written text and images, and traditional text-only search engines aren't precise enough.

Not ideal if your retrieval needs are purely text-based or if you are looking for a simple keyword search solution.

multimodal-search document-intelligence information-retrieval research-assist data-mining

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

debanjan06/geospatial-rag

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries,...

berntpopp/phentrieve

AI-powered system for mapping clinical text to Human Phenotype Ontology (HPO) terms using...

Atharv279/RAGify-Finance

Benchmarks Cohere vs HuggingFace embeddings for financial document Q&A using RAG

SirSail/Priqualis

Pre-submission compliance validator for healthcare claims. Combines rule-based validation (YAML...

IanD25/principia-diagnostics

Graph coherence engine for research datasets — Fisher Information diagnostics, two-tier reports,...

Explore Embedding Tools

All categories Trending Embeddings directory Insights