Multimodal RAG Systems RAG Tools

Tools and frameworks for retrieval-augmented generation systems that process and integrate multiple data modalities (images, text, video, audio, tables) together. Does NOT include single-modality RAG, domain-specific RAG applications, or general multimodal AI without retrieval components.

There are 105 multimodal rag systems tools tracked. 4 score above 50 (established tier). The highest-rated is illuin-tech/colpali at 59/100 with 2,555 stars. 1 of the top 10 are actively maintained.

Get all 105 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=multimodal-rag-systems&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 illuin-tech/colpali

The code used to train and run inference with the ColVision models, e.g....

59
Established
2 AnswerDotAI/byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

55
Established
3 jolibrain/colette

Multimodal RAG to search and interact locally with technical documents of any kind

51
Established
4 nannib/nbmultirag

Un framework in Italiano ed Inglese, che permette di chattare con i propri...

50
Established
5 OpenBMB/VisRAG

Parsing-free RAG supported by VLMs

49
Emerging
6 chiang-yuan/llamp

[EMNLP '25] A web app and Python API for multi-modal RAG framework to ground...

47
Emerging
7 Leon1207/Video-RAG-master

โœจโœจ[NeurIPS 2025] This is the official implementation of our paper...

44
Emerging
8 cilabuniba/artseek

ArtSeek: Deep artwork understanding via multimodal in-context reasoning and...

43
Emerging
9 tonywu71/colpali-cookbooks

Recipes for learning, fine-tuning, and adapting ColPali to your multimodal...

42
Emerging
10 llm-lab-org/Multimodal-RAG-Survey

A Survey on Multimodal Retrieval-Augmented Generation

41
Emerging
11 JuliaGenAI/ColBERT.jl

Efficient late-interaction retrieval systems in Julia!

41
Emerging
12 deep-div/Multimodel-RAG

Multimodal RAG ingests PDFs and generates combined text and image outputs by...

39
Emerging
13 ACMarcone86/artseek

ArtSeek combines late-interaction retrieval over a 5M+ multimodal corpus...

38
Emerging
14 adithya-s-k/VARAG

Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine

37
Emerging
15 wgcyeo/UniversalRAG

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse...

37
Emerging
16 chg0901/Honor_of_Kings_Multi-modal_Dataset

A Multi-modal RAG Project with Dataset from Honor of Kings, one of the most...

35
Emerging
17 the-bird-F/GLM-Voice-RAG

[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end...

34
Emerging
18 Ahmed-AI-01/Multimodal-RAG

An AI-powered chat application using text, audio, and images for...

34
Emerging
19 richard-peng-xia/RULE

[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision...

33
Emerging
20 joohyung00/lilac

This is the public repository for "LILaC: Late Interacting in Layered...

32
Emerging
21 MohamedMostafa259/pif-multimodal-rag

A modular, multilingual, and multimodal Retrieval-Augmented Generation (RAG)...

32
Emerging
22 zhaosuifeng/FinRAGBench-V

FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the...

30
Emerging
23 Hoar012/RAP-MLLM

[CVPR 2025] RAP: Retrieval-Augmented Personalization

30
Emerging
24 AhmedAl93/multimodal-semantic-RAG

A RAG system designed to process documents with multimodal content. It can...

30
Emerging
25 dame-cell/VisionRAG

A new novel multi-modality (Vision) RAG architecture

29
Experimental
26 DataFog/vlm-api

REST API for computing cross-modal similarity between images and text using...

29
Experimental
27 ChaoLinAViy/OMGM

OMGM: Orchestrate Multiple Granularities and Modalities for Efficient...

29
Experimental
28 RodneyFinkel/groq_deepgram_agent

Multi Modal Agent using Deepgram and Groq LPU's and Sentence Transformers...

28
Experimental
29 santiago68310/RAG-based-multimodal-agent

A sophisticated Retrieval-Augmented Generation (RAG) system that combines...

27
Experimental
30 kyopark2014/llm-multimodal-and-rag

It shows how to use mutimodal and RAG based on multi-region LLM.

27
Experimental
31 aimagelab/ReT-2

Recurrence Meets Transformers for Universal Multimodal Retrieval

27
Experimental
32 SnowNation101/Nyx

Code for the paper โ€œTowards Mixed-Modal Retrieval for Universal...

27
Experimental
33 Azure-Samples/multimodal_rag_python

Python notebook for solving overlapping tables problem with Azure document...

27
Experimental
34 DuhanJishnu/NeuraNexus

Offline Multimodal RAG System for Unified Retrieval from Text, Image, and Audio Data

26
Experimental
35 cany7/LumiCite

LumiCite is a multimodal RAG system for academic papers, designed for...

26
Experimental
36 naimkatiman/Multi-Modal-RAG-Pipeline-on-Images-and-Text-Locally

My first Multi-Modal RAG pipeline....Dummy version

25
Experimental
37 RecSys-lab/RAG-VisualRec

๐Ÿง  A Resource for Multi-Modal Learning in Visual RAGs

25
Experimental
38 MMDocRAG/MMDocRAG

The code used to train and run inference with MMDocRAG

24
Experimental
39 seth-woo/mkrs-optional-memory

Multimodal Knowledge Retrieval System with Optional Memory (MKRS)

24
Experimental
40 Alijanloo/MultiModalRag

A Multi-Modal Agentic RAG pipeline designed to handle unstructured documents...

24
Experimental
41 Rayen-Hamza/Klippy

A text-centric multimodal local first RAG system with knowledge graph...

23
Experimental
42 starsuzi/VideoRAG

VideoRAG: Retrieval-Augmented Generation over Video Corpus

23
Experimental
43 jiangnanboy/pdf_multimodal_rag

pdf multimodal rag ใ€pdfๅคšๆจกๆ€rag้—ฎ็ญ”ใ€‘

22
Experimental
44 THE-S0HAM/OmniWhale-RAG

Generalized, Offline-First Multimodal AI System

22
Experimental
45 GenCEO/mm-rag-playbook

Lightweight multimodal RAG patterns for PDF-like documents

22
Experimental
46 CKeibel/FHSWF-deep-learning

Multimodal RAG and comparisons between language models. (Project for Deep...

21
Experimental
47 connectpool/multimodal-rag-lab

Compact multimodal RAG baseline with chunking, BM25 retrieval and prompt assembly.

21
Experimental
48 medazizsaaadallah/Knowledge-Infused-Multimodal-Retrieval-A-RAG-Based-Approach-for-Context-Aware-Image-Understanding

๐ŸŒŸ Enhance image understanding through a RAG-based approach, combining...

21
Experimental
49 SainathPattipati/multi-modal-rag

RAG over images, PDFs, tables, and structured data โ€” unified retrieval...

21
Experimental
50 alilooop/AssetRetrieval3D

๐ŸŒ Retrieve 3D assets effortlessly using text or images with this multi-modal...

21
Experimental
51 ResearchAgents/multimodal-doc-rag

A lightweight pipeline for multimodal document retrieval and QA using...

21
Experimental
52 aniketpoojari/Enterprise-AI-Assistant-MCP

Production-grade Multi-Modal RAG system for intelligent document Q&A with...

21
Experimental
53 nicolas-len/gcp-multimodal-ai-rag

Multimodal AI knowledge base, RAG on GCP with Gemini parsing, BigQuery...

21
Experimental
54 id4thomas/psi-king

Framework for building Multimodal Document Retrievers

21
Experimental
55 forfrt/vgsg_rag

Visual Grounded Story Generation with RAG

20
Experimental
56 RazerArdi/Knowledge-Infused-Multimodal-Retrieval-A-RAG-Based-Approach-for-Context-Aware-Image-Understanding

A modular RAG-based framework for image retrieval and context-aware...

20
Experimental
57 Bhavik-Ardeshna/Multimodal-VideoRAG

Multimodal-VideoRAG: Using BridgeTower Embeddings and Large Vision Language Models

20
Experimental
58 tph-kds/TriModalRAG_System

*Built upon the integration of text, image, and audio modalities, this...

19
Experimental
59 SubhamIO/Multimodal-RAG-System

Handle mixture of content types, including text, tables and images using...

19
Experimental
60 simoncampos1022/RAG-System-arXivRAG-Multimodal-Conversational

A practical, multimodal-multilingual RAG chatbot application powered by...

19
Experimental
61 TioeAre/BayesRAG

BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal...

19
Experimental
62 DngBack/HPC-ColPali

Implementation of Hierarchical Patch Compression for ColPali: Efficient...

19
Experimental
63 adam-aimoscloud/MoleSearch

Multimodal data Retriever, including text, image, video, audio

18
Experimental
64 neha-nambiar/Retrieval-Augmented-Multimodal-AI-for-Engineering-Homework-Solving

Engineering Homework solver using ColPali PDF retrieval, Qwen2.5-VL...

18
Experimental
65 selvatharrun/Multimodal-RAG-Application

A comprehensive Multimodal Retrieval-Augmented Generation (RAG) application...

18
Experimental
66 WizKnight/MultimodalMovieRAG

A multimodal movie search engine using RAG techniques. It allows users to...

17
Experimental
67 AliHamzaAzam/multimodal-rag

Multimodal RAG system with CLIP embeddings, FAISS search, and MLX-powered Mistral LLM

17
Experimental
68 Ghost-141/Multi-Modal-Local-RAG

A Multi-Modal RAG Pipeline with Local LLMs

17
Experimental
69 SungJuyeon/multimodal_RAG_System

์ด๋ฏธ์ง€, ์˜์ƒ์„ ์—…๋กœ๋“œํ•˜์—ฌ ์งˆ์˜์‘๋‹ตํ•˜๋Š” ์‹œ์Šคํ…œ

17
Experimental
70 muthusamir/GraphMultimodalRAG

Enhancing Vision-Language Retrieval with Graph-Based and Multimodal RAG Integration

17
Experimental
71 sakshamVerma08/MultiModal-RAG-Practice-

Multi-Modal RAG: Retrieval-Augmented Generation over Text and Visual PDFs A...

17
Experimental
72 Koushiki-Chakraborty/Multimodal-Question-Answering

Collaborative research exploring multimodal question answering using OCR,...

16
Experimental
73 MMDocRAG/MMDocIR

The code used to train and run inference with MMDocIR

16
Experimental
74 Ashutosh-AIBOT/multimodal-rag-research-assistant

Multi-source RAG assistant โ€” chat with PDFs, research YouTube channels,...

16
Experimental
75 Nir0g0/Multimodal-RAG

This project is a multimodal Retrieval-Augmented Generation (RAG) system...

15
Experimental
76 winstonbartlegod/enhanced-multimodal-rag

Advanced document analysis and chat app with Mistral OCR, Marker (Gemini 2.5...

15
Experimental
77 RS2002/Image2Music

Official Repository for The Paper, Zero-Effort Image-to-Music Generation: An...

15
Experimental
78 Arnav000/Multimodal-RAG

This repository contains a full-stack Multimodal Retrieval-Augmented...

15
Experimental
79 AnithaKarre/multimodel_RAG

Multimodal RAG pipeline that ingests PDFs, Word docs, CSVs, Excel files, and...

14
Experimental
80 RitamPatra/rag-project

Multimodal RAG chatbot

14
Experimental
81 jthiruveedula/multimodal-rag-pipeline

End-to-end Multimodal RAG pipeline ingesting PDFs, images, and audio using...

14
Experimental
82 rutvik29/multimodal-rag

Production multimodal RAG pipeline: ingests PDFs, images, and tables with...

14
Experimental
83 robustvisrag/RobustVisRAG

CVPR26 - RobustVisRAG: Causality-Aware Vision-Based Retrieval-Augmented...

14
Experimental
84 Sreyan88/RECAP

Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning

14
Experimental
85 emrekuruu/local-multimodal-personal-knowledge-base

A multi-hop multimodal RAG system to chat with your PDFs locally, using...

14
Experimental
86 Schinkenwurst/lightmrag

Lightweight multimodal RAG baseline with late-fusion retrieval

13
Experimental
87 isatyamks/multimodal-rag

Multimodal RAG system for generating test cases and use cases from documents...

13
Experimental
88 sgxs2014/mmrag-toolkit

A minimal toolkit for Multimodal RAG โ€” retrieve images and text, ground...

13
Experimental
89 prakhar175/multimodal-RAG-application

Multimodal pdf based RAG application where it scans the pdf for text and...

13
Experimental
90 easy1ive/modality-router-kit

Lightweight modality-aware query router for multimodal RAG experiments

13
Experimental
91 Moncef-Bj/cv-papers-rag

Multimodal RAG system for Computer Vision research papers with intelligent...

13
Experimental
92 Shubin-vadim/Arxplover

Comprehensive multimodal system for analyzing documents with support for...

13
Experimental
93 amitkumarj441/mRAG-gim

Code for CIKM'25 paper - Multimodal RAG Enhanced Visual Description

13
Experimental
94 suncatchin/visual-rag

Lightweight multimodal RAG pipeline for image-and-text understanding โ€” CLIP...

13
Experimental
95 thomaskty/HybridRag

Multi-modal RAG system with vector embeddings for text retrieval, GPT-4V for...

12
Experimental
96 montoyitadevelp/multi-vector-image-retrieval

Multimodal approach with memory and speed optimization for RAG systems

12
Experimental
97 Cyangen/RAG

Multimodal RAG

11
Experimental
98 Gabya06/cat_breeds

A multimodal RAG system powered by Gemini and ChromaDB that lets you explore...

11
Experimental
99 anusha-chebolu/multimodal-rag

A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for...

11
Experimental
100 YeonwooSung/vision-rag

Vision-based RAG

11
Experimental
101 dongxuecheng/SafetyVision-RAG

AI-Powered Safety Hazard Detection System using VLM and...

10
Experimental
102 SJ9VRF/Multimodal-RAG

Multimodal RAG using LangChain and Vertex AI for advanced document search...

10
Experimental
103 K0EKJE/VLM-Based-Retrieval-Augmented-Generation

Stanford NLP course group project repo. RAG based on VLM retriever for...

10
Experimental
104 Lizhecheng02/MultiModal

Basic implementation code for multimodal models and some applications or...

10
Experimental
105 pynterest83/VDT-UniversalRAG

Multimodal RAG for Vietnamese

10
Experimental