The Document AI Directory
Quality-scored directory of 154 document ai tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.
Document parsing and extraction tools for AI pipelines — OCR engines, PDF parsers, table extractors, and the plumbing that turns unstructured documents into structured data.
4
70–100
57
50–69
62
30–49
31
10–29
Top tools by quality score
| # | Tool | Score |
|---|---|---|
| 1 |
opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your... |
|
| 2 |
mehmet-kozan/pdf-parse
Pure TypeScript, cross-platform module for extracting text, images, and... |
|
| 3 |
HIllya51/LunaTranslator
视觉小说翻译器 / Visual Novel Translator |
|
| 4 |
ShareX/ShareX
ShareX is a free and open-source application that enables users to capture... |
|
| 5 |
btwld/docling-sdk
A TypeScript SDK for Docling - Bridge between the Python Docling ecosystem... |
|
| 6 |
STranslate/STranslate
A ready-to-go translation ocr tool developed with WPF/WPF 开发的一款即用即走的翻译、OCR工具 |
|
| 7 |
tisfeng/Easydict
一个简洁优雅的词典翻译 macOS App。开箱即用,支持离线 OCR 识别,支持有道词典,🍎 苹果系统词典,🍎... |
|
| 8 |
zclucas/RMT
RMT (RuoMengTu) is a free, open-source macro tool built on AHKv2. Let the... |
|
| 9 |
readur/readur
Quick, painless, intuitive OCR platform written in Rust and TypeScript.... |
|
| 10 |
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis,... |
|
| 11 |
run-llama/llama-cloud-py
Python SDK for OCR and document parsing in the cloud with LlamaParse |
|
| 12 |
TheJoeFin/Text-Grab
Use OCR in Windows quickly and easily with Text Grab. With optional... |
|
| 13 |
docling-project/docling
Get your documents ready for gen AI |
|
| 14 |
ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched |
|
| 15 |
RapidAI/RapidOCR
📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime,... |
|
| 16 |
bpwhelan/GameSentenceMiner
An immersion toolkit for learning Languages through games and other visual media. |
|
| 17 |
datalab-to/chandra
OCR model that handles complex tables, forms, handwriting with full layout. |
|
| 18 |
xushengfeng/eSearch
截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search ... |
|
| 19 |
run-llama/liteparse
A fast, helpful, and open-source document parser |
|
| 20 |
zai-org/GLM-OCR
GLM-OCR: Accurate × Fast × Comprehensive |
|