aimagelab/ReT-2
Recurrence Meets Transformers for Universal Multimodal Retrieval
This project helps researchers and data scientists efficiently search through large collections of mixed text and image data. You provide it with a query (which can be text, an image, or both) and a dataset of documents that contain both text and images. It then returns the most relevant documents, enabling tasks like answering complex questions or finding related content across different media types.
Use this if you need to find specific information or content by searching with both text and images against a vast, diverse dataset of multimodal documents.
Not ideal if your data is exclusively text-based or image-based, or if you need to perform quick, ad-hoc searches on a small, simple collection.
Stars
15
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/aimagelab/ReT-2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
illuin-tech/colpali
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
AnswerDotAI/byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
jolibrain/colette
Multimodal RAG to search and interact locally with technical documents of any kind
nannib/nbmultirag
Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG,...
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs