Document Intelligence RAG Embedding Tools

Tools for uploading, searching, and conversationally querying documents (PDFs, files, etc.) using embeddings and semantic search to extract insights and answers. Does NOT include code documentation generation, code search, or cross-document fact-checking systems.

There are 47 document intelligence rag tools tracked. The highest-rated is haven-jeon/LegalQA at 45/100 with 97 stars.

Get all 47 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=document-intelligence-rag&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 haven-jeon/LegalQA

Korean LegalQA using SentenceKoBART

45
Emerging
2 maxent-ai/ocrpy

OCR, Archive, Index and Search: Implementation agnostic OCR framework.

44
Emerging
3 ametnes/nesis

Your AI Powered Enterprise Knowledge Partner. Designed to be used at scale...

43
Emerging
4 foxminchan/LawKnowledge

A legal knowledge search and Q&A application based on Vietnam's Legal Code...

40
Emerging
5 intel/document-automation

Document Automation Reference Kit

38
Emerging
6 machinelearningZH/document-research-tool

Perform intelligent research over document collections using hybrid search and LLMs.

37
Emerging
7 utachicodes/PyDocEnhancer

An AI-powered Python plugin to enhance documentation with summaries, code...

36
Emerging
8 Schematise-Lex-Data-Analysis/lex-liberalis

A fork of Semantra for Indian court judgments

34
Emerging
9 ryanlane/document-manager

Local-first document archive assistant for semantic search and RAG using...

33
Emerging
10 joe32140/tei-qdrant-cache

Docker Compose stack for scalable TEI embeddings (multi-GPU) fronted by a...

32
Emerging
11 FellowTraveler/ngest

Python script for ingesting various files into a semantic graph. For text,...

31
Emerging
12 kchanda24/hackathon-backend

Enterprise Content Management MVP with semantic search capabilities. Upload...

30
Emerging
13 Leg0shii/smart-documents

A web application that enables users to upload documents and utilize AI...

30
Emerging
14 mcplusa/elastic-ingest-http

This is an Elasticsearch Ingest Pipeline Processor that calls an HTTP(s)...

30
Emerging
15 josego85/pdf-content-search

🔍 AI-powered PDF search with OCR support for scanned documents, local AI via...

27
Experimental
16 VedantKothari01/DocInsight

AI-powered document originality and plagiarism risk detection system...

26
Experimental
17 moonlitrevery/DodocLens

Inteligência documental com IA local (OCR + busca semântica) para PDFs e...

25
Experimental
18 HarshilMaks/InsightDocs

AI Document Intelligence System for deep analysis and semantic querying of...

24
Experimental
19 HemalDholakiya12/PDFChat

A web app that allows users to upload PDFs and interact with them through a...

23
Experimental
20 harshsrivastava05/Document-Analyzer

An AI-powered document analysis platform that transforms uploaded files into...

23
Experimental
21 gracee3/qdrant-bge-stack

Local deployment stack for Qdrant vector search with vLLM-served BAAI...

22
Experimental
22 mry0tt4/DocGenie

AI-powered documentation platform that automatically generates, categorizes,...

22
Experimental
23 xhulianokoci/DocCompareAI

ASP.NET Core API for comparing Word documents with AI — text diff, OpenAI...

22
Experimental
24 danilagoleen/vetka-ingest-engine

Ingestion/indexing core for agent systems: scanning, extraction, dependency...

22
Experimental
25 Tonemon/StaxRead

Self-hosted semantic search over your own documents. Your own self-hosted...

21
Experimental
26 LeonKiptoo/document-intelligence-engine

A document intelligence system that enables semantic question answering over...

21
Experimental
27 ashankgupta/docai

DocAI is a Go-based toolkit that enables intelligent interaction with your...

21
Experimental
28 ventz/pdf-semantic-keyword-analysis

High-performance PDF Semantic keyword analysis tool using AI for intelligent...

20
Experimental
29 JacobPolloreno/OfficeAnswers

Get to the real work by using neural information retrieval for company information.

19
Experimental
30 KaramelBytes/docloom-cli

AI‑augmented document analysis and lightweight retrieval (Go) with...

17
Experimental
31 cosmanBrenden/DocumentMuncher

DocumentMuncher is a locally running document seach engine that allows you...

17
Experimental
32 KaavyaGala546/DocuMind-AI

DocuMind-AI is an AI-powered document assistant that allows users to upload...

15
Experimental
33 akbar-ops/sistema-de-analisis-de-documentos-juridicos

đź“„ Analyze, classify, and search legal documents with advanced NLP techniques...

15
Experimental
34 David-mwas/vidmindAI

VIDMIND is a system designed to automatically summarize, analyze, and...

14
Experimental
35 Helixo613/docforensics

Cross-document contradiction and agreement detection for PDF collections...

14
Experimental
36 Irshad-11/PDF-INSIGHTS

Smart PDF Analyzer with OCR and Semantic Search

13
Experimental
37 tstephx/book-ingestion-python

Book ingestion pipeline for processing PDF/EPUB into searchable chapters...

13
Experimental
38 mamoon-17/DocuQuery

DocuQuery — a minimal RAG demo: upload PDFs, generate local embeddings,...

13
Experimental
39 bivex/qdrant_streamlit_generator_via_groq

🔍 QDRANT + STREAMLIT + GROQ = VECTOR SEARCH UI. Explore embeddings....

13
Experimental
40 devinitive-team/mirage

🏜️ Mirage: Universal, relevance search over PDF documents at any scale....

13
Experimental
41 maharishiayurveda/DocQuify

Extract insights from research papers with DocQuify. Upload PDFs and ask...

13
Experimental
42 KishoreMuruganantham/HackRx-6.0-Intelligent-Query-Retrieval

LLM-powered system for intelligent query–retrieval from large documents in...

13
Experimental
43 kstv364/intellidoc

Hackathon project - Intellidoc - ECM MVP with semantic search capabilities....

13
Experimental
44 dyannadle/AI-Powered-Search-Over-Noation

An AI-powered document search engine that connects to Notion and Google...

13
Experimental
45 hlw-aryan/DocuMate

Unlock the true potential of your document assets with DocuMate's...

11
Experimental
46 Mielone2Good/DocVision-AI

Intelligent PDF Document Understanding System with semantic document search...

10
Experimental
47 naKarthikSurya/Legal-AI-Model

An AI-powered Legal Information Retrieval System for Indian Laws and Court...

10
Experimental