tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

61
/ 100
Established

Tesseract OCR helps you convert scanned documents, images, or PDFs containing text into editable and searchable digital text. You provide an image file, and it outputs the text in various formats like plain text or searchable PDF. This tool is ideal for anyone who needs to extract text from images for archiving, analysis, or further processing.

72,883 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need to accurately extract text from scanned documents, photographs, or other image files in over 100 languages.

Not ideal if you require a graphical user interface (GUI) or expect perfect recognition from poor quality images without prior enhancement.

document-digitization data-entry-automation text-extraction digital-archiving content-conversion
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

72,883

Forks

10,541

Language

C++

License

Apache-2.0

Category

latex-ocr-tools

Last pushed

Feb 28, 2026

Commits (30d)

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/tesseract-ocr/tesseract"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.