PDF Document Processing RAG Tools
Tools and systems for extracting, parsing, and retrieving information from PDF documents through OCR, layout analysis, and structured data conversion. Does NOT include general chatbots, multi-source document handling beyond PDFs, or chat interfaces built on top of processed PDFs.
There are 65 pdf document processing tools tracked. 2 score above 50 (established tier). The highest-rated is thiswillbeyourgithub/wdoc at 60/100 with 510 stars.
Get all 65 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=pdf-document-processing&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider,... |
|
Established |
| 2 |
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频... |
|
Established |
| 3 |
NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing... |
|
Emerging |
| 4 |
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM... |
|
Emerging |
| 5 |
David-Lolly/ViewRAG
图文并茂的 PDF RAG 系统:支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features... |
|
Emerging |
| 6 |
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建) |
|
Emerging |
| 7 |
3DCF-Labs/doc2dataset
3DCF / doc2dataset: token-efficient document layer with NumGuard numeric... |
|
Emerging |
| 8 |
preprocess-co/rag-document-viewer
RAG Document Viewer is an open-source library that generates high-fidelity... |
|
Emerging |
| 9 |
zzstoatzz/raggy
scraping and querying documents for LLMs |
|
Emerging |
| 10 |
ManiAm/RAG-Mail
RAG-Mail is a thread-aware email processing system that semantically indexes... |
|
Emerging |
| 11 |
e-kotov/rdocdump
rdocdump: Dump ‘R’ Package Source, Documentation, and Vignettes into One File |
|
Emerging |
| 12 |
salameaz/pdf-process-rag
A Python-based application that extracts and processes PDF content using a... |
|
Emerging |
| 13 |
antoninomariarizzo/rag
A Python library for Retrieval-Augmented Generation (RAG) that extracts text... |
|
Emerging |
| 14 |
MalayAgr/bookacle
bookacle is a RAPTOR-based RAG application to aid in understanding complex... |
|
Emerging |
| 15 |
MohammedNasserAhmed/RAGPost
RAGPost is an intelligent blog post generator that leverages... |
|
Emerging |
| 16 |
AKSHAYINDIA05/Document_Comparison_System
Implement a Retrieval Augmented Generation (RAG) with a user interface for... |
|
Experimental |
| 17 |
natanhp/PythoRAG
PythoRAG is a simple, open-source project designed to facilitate... |
|
Experimental |
| 18 |
iamarunbrahma/rag-ingest
RAG-Ingest: A tool for converting PDFs to markdown and indexing them for... |
|
Experimental |
| 19 |
Besthope-Official/predoc
Preprocess document service for RAG (Retriveal Augumented Generation) |
|
Experimental |
| 20 |
ParthSareen/simple-rag
Too many docs? Quickly search over any PDF or Markdown documents |
|
Experimental |
| 21 |
SStephanJX/Snowflake-RAG-System
Production-ready Snowflake RAG system with type-specific chunking |
|
Experimental |
| 22 |
liunian-Jay/MU-GOT
PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout... |
|
Experimental |
| 23 |
juhaodong/large-file-translator
Extract the content while preserving the layout, images, and tables. Perform... |
|
Experimental |
| 24 |
este6an13/checks-ocr
Software that applies OCR + RAG to extract bank checks information |
|
Experimental |
| 25 |
lolbigtime/Folio
Zero-config Swifty RAG toolkit for iOS & macOS — PDF/text loaders, universal... |
|
Experimental |
| 26 |
salim-lakhal/rag-document-pipeline
Production RAG pipeline: multi-format document extraction → intelligent... |
|
Experimental |
| 27 |
Nexialism-Friday/hwpx-toolkit
HWP/HWPX document processing toolkit — extraction, generation, vectorization... |
|
Experimental |
| 28 |
slvg01/90.10d_RAG_OnTheFly
An app allowing to upload files (ppt, doc, pdf, zip) and RAG on their content |
|
Experimental |
| 29 |
Vibhuarvind/Content-Engine-RAG-for-PDF
Content Engine is RAG system that analyzes and compares multiple PDF... |
|
Experimental |
| 30 |
FrostWillmott/FinDocBot
Modern RAG, designed for semantic search and question-answering over... |
|
Experimental |
| 31 |
yotaken/docuggez
Automatic project documentator |
|
Experimental |
| 32 |
JochiRaider/sievio
Sievio turns GitHub, local repos, and web PDFs into clean JSONL for LLM... |
|
Experimental |
| 33 |
JuliaGenAI/DocsScraper.jl
Efficient RAG knowledge pack creator from online Julia documentation |
|
Experimental |
| 34 |
Clearedge-AI/clearedge
Build a RAG preprocessing pipeline |
|
Experimental |
| 35 |
S0lkar/IntGathering-x-RAG--BlazingDocs
RAG-based tool for document batch querying. |
|
Experimental |
| 36 |
silas-rickards/PDF-LLM-RAG
A RAG pipeline specialized for local pdfs. |
|
Experimental |
| 37 |
sfkunal/librarian
Librarian is a RAG-assisted LLM application that allows any user to query... |
|
Experimental |
| 38 |
A-Najjar/rag-factory
Modular RAG system with Factory Pattern - Load PDF/Word docs, configure... |
|
Experimental |
| 39 |
husaynirfan1/PullData
RAG with response in what you need. Output directly with supported format... |
|
Experimental |
| 40 |
solomonjie/rag-processor
RAG index pipeline, from raw data clean to index. each step communicate via... |
|
Experimental |
| 41 |
alrafiabdullah/doc_rag
Document RAG with HuggingFace Token |
|
Experimental |
| 42 |
yagmur-kurtbas/pdf-rag-pipeline
A RAG pipeline for PDF question answering using LangChain, ChromaDB and Groq... |
|
Experimental |
| 43 |
ahmad-albasha/DataForg
PDF to JSON pipeline with intelligent bilingual chunking (AR/EN) and a fully... |
|
Experimental |
| 44 |
ritheesh-dev/Local-PDF-RAG-System
Privacy-first local PDF RAG system using FAISS + Ollama — fully offline,... |
|
Experimental |
| 45 |
avocatt/ocr-rag-highlighted-viewer
OCR + RAG document viewer with highlighted search results |
|
Experimental |
| 46 |
2dogsandanerd/rag_pdf_audit
Tool to compare pdf extraction methods |
|
Experimental |
| 47 |
fllin1/mawa
RAG workflow (Mistral OCR + Gemini) for complex regulatory PDFs.... |
|
Experimental |
| 48 |
julicq/PDF-RAG-Query
RAG model for PDF database |
|
Experimental |
| 49 |
shivkhurana/technical-docs-rag-pipeline
Enterprise-grade RAG (Retrieval Augmented Generation) pipeline using... |
|
Experimental |
| 50 |
will695672804/graphrag-engineering-pdfs
🔍 Extract entities and build knowledge graphs from large engineering PDFs,... |
|
Experimental |
| 51 |
andersborgabiro/RagQueryDocuments
RAG application that makes it easy to search in multiple documents |
|
Experimental |
| 52 |
ashwyan/local-llm-pdf-analyzer
A local AI tool using Ollama (Llama 3) to analyze PDF documents and generate... |
|
Experimental |
| 53 |
Qinnovation123/papers
PDF embedding workflow |
|
Experimental |
| 54 |
adrianizmi/Simple-RAG
Minimalist RAG system built from scratch using Python, local embeddings, and... |
|
Experimental |
| 55 |
mshojaei77/DataSpeakGPT
Read files and images and retrieve data for LLM |
|
Experimental |
| 56 |
zenmakhlouf/arabic-rag-pipeline
A single-file RAG pipeline for Arabic PDF lectures with two-stage retrieval,... |
|
Experimental |
| 57 |
nkarast/ask-my-pdf
A RAG application using local LLM to answer questions given a PDF. |
|
Experimental |
| 58 |
siddharth-nandagopal/billionaires-rag-query
Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's... |
|
Experimental |
| 59 |
bazilicum/pdf-query
This project processes and retrieves information from PDF file or PDF... |
|
Experimental |
| 60 |
zhangshi0512/DevTools
A lightweight Python-based Software Package for daily use |
|
Experimental |
| 61 |
AlinaBaber/Document-Analysis-Identification-with-RAG-Vector-Database-and-Mistral-LLM
This Document Analysis pipeline is a comprehensive document analysis system,... |
|
Experimental |
| 62 |
pvmodayil/ragyphi
An entire RAG (Retrieval-Augmented Generation) pipeline library designed to... |
|
Experimental |
| 63 |
swax10/anaya
Anaya is a Content Engine that specializes in analyzing and comparing... |
|
Experimental |
| 64 |
SuchitG04/multi_doc_rag
RAG application to query multiple docs. Built to query 10K reports of companies. |
|
Experimental |
| 65 |
ITSAIDI/RAGify
RAGify is a Retrieval-Augmented Generation (RAG) application designed to... |
|
Experimental |