mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
This project helps anyone who needs to extract text from documents like PDFs, images, or even webpages. It takes your document files as input and outputs the identified text, including its location on the page. You can use it to convert scanned documents into searchable and editable text.
5,956 stars. Actively maintained with 1 commit in the last 30 days.
Use this if you need to reliably extract textual information from various document types, including handling rotated pages, and want the flexibility to choose specific text detection and recognition models.
Not ideal if you only need basic, straightforward text extraction without needing advanced control over model architectures or detailed output like bounding box coordinates.
Stars
5,956
Forks
627
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 09, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mindee/doctr"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related frameworks
JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin,...
breezedeus/CnSTD
CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula...
githubharald/SimpleHTR
Handwritten Text Recognition (HTR) system implemented with TensorFlow.
felixdittrich92/OnnxTR
OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless,...
parlance/ctcdecode
PyTorch CTC Decoder bindings