aimagelab/ReT
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
This project helps you find specific documents from a large collection by understanding both text and images. You provide a question or description, possibly with an image, and it returns relevant documents that match your query, even if the information is spread across text and visuals. This is ideal for researchers, analysts, or anyone who needs to accurately retrieve information from complex, multimodal datasets.
No commits in the last 6 months.
Use this if you need to perform highly accurate searches on documents that contain both written text and images, and traditional text-only search engines aren't precise enough.
Not ideal if your retrieval needs are purely text-based or if you are looking for a simple keyword search solution.
Stars
34
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Sep 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/aimagelab/ReT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
debanjan06/geospatial-rag
AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries,...
berntpopp/phentrieve
AI-powered system for mapping clinical text to Human Phenotype Ontology (HPO) terms using...
Atharv279/RAGify-Finance
Benchmarks Cohere vs HuggingFace embeddings for financial document Q&A using RAG
SirSail/Priqualis
Pre-submission compliance validator for healthcare claims. Combines rule-based validation (YAML...
IanD25/principia-diagnostics
Graph coherence engine for research datasets — Fisher Information diagnostics, two-tier reports,...