mindee/doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

/ 100

Established

This project helps anyone who needs to extract text from documents like PDFs, images, or even webpages. It takes your document files as input and outputs the identified text, including its location on the page. You can use it to convert scanned documents into searchable and editable text.

5,956 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need to reliably extract textual information from various document types, including handling rotated pages, and want the flexibility to choose specific text detection and recognition models.

Not ideal if you only need basic, straightforward text extraction without needing advanced control over model architectures or detailed output like bounding box coordinates.

document-processing data-extraction digitization information-retrieval text-recognition

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

5,956

Forks

627

Language

Python

License

Apache-2.0

Compare

doctr and OnnxTR

Related frameworks

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin,...

breezedeus/CnSTD

CnSTD: 基于 PyTorch/MXNet 的中文/英文场景文字检测（Scene Text Detection）、数学公式检测（Mathematical Formula...

githubharald/SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

felixdittrich92/OnnxTR

OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless,...

parlance/ctcdecode

PyTorch CTC Decoder bindings

Explore ML Frameworks

All categories Trending ML Framework directory Insights