The Document AI Directory

Quality-scored directory of 154 document ai tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.

Document parsing and extraction tools for AI pipelines — OCR engines, PDF parsers, table extractors, and the plumbing that turns unstructured documents into structured data.

Verified

4

70–100

Established

57

50–69

Emerging

62

30–49

Experimental

31

10–29

Top tools by quality score

# Tool Score
1 opendatalab/MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your...

80
2 mehmet-kozan/pdf-parse

Pure TypeScript, cross-platform module for extracting text, images, and...

76
3 HIllya51/LunaTranslator

视觉小说翻译器 / Visual Novel Translator

71
4 ShareX/ShareX

ShareX is a free and open-source application that enables users to capture...

71
5 btwld/docling-sdk

A TypeScript SDK for Docling - Bridge between the Python Docling ecosystem...

69
6 STranslate/STranslate

A ready-to-go translation ocr tool developed with WPF/WPF 开发的一款即用即走的翻译、OCR工具

69
7 tisfeng/Easydict

一个简洁优雅的词典翻译 macOS App。开箱即用,支持离线 OCR 识别,支持有道词典,🍎 苹果系统词典,🍎...

68
8 zclucas/RMT

RMT (RuoMengTu) is a free, open-source macro tool built on AHKv2. Let the...

68
9 readur/readur

Quick, painless, intuitive OCR platform written in Rust and TypeScript....

68
10 pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis,...

68
11 run-llama/llama-cloud-py

Python SDK for OCR and document parsing in the cloud with LlamaParse

67
12 TheJoeFin/Text-Grab

Use OCR in Windows quickly and easily with Text Grab. With optional...

67
13 docling-project/docling

Get your documents ready for gen AI

67
14 ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

67
15 RapidAI/RapidOCR

📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime,...

66
16 bpwhelan/GameSentenceMiner

An immersion toolkit for learning Languages through games and other visual media.

65
17 datalab-to/chandra

OCR model that handles complex tables, forms, handwriting with full layout.

65
18 xushengfeng/eSearch

截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search ...

65
19 run-llama/liteparse

A fast, helpful, and open-source document parser

64
20 zai-org/GLM-OCR

GLM-OCR: Accurate × Fast × Comprehensive

64

Browse by category