mindee/doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

59
/ 100
Established

This project helps anyone who needs to extract text from documents like PDFs, images, or even webpages. It takes your document files as input and outputs the identified text, including its location on the page. You can use it to convert scanned documents into searchable and editable text.

5,956 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need to reliably extract textual information from various document types, including handling rotated pages, and want the flexibility to choose specific text detection and recognition models.

Not ideal if you only need basic, straightforward text extraction without needing advanced control over model architectures or detailed output like bounding box coordinates.

document-processing data-extraction digitization information-retrieval text-recognition
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

5,956

Forks

627

Language

Python

License

Apache-2.0

Last pushed

Mar 09, 2026

Commits (30d)

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mindee/doctr"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.