tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

/ 100

Established

Tesseract OCR helps you convert scanned documents, images, or PDFs containing text into editable and searchable digital text. You provide an image file, and it outputs the text in various formats like plain text or searchable PDF. This tool is ideal for anyone who needs to extract text from images for archiving, analysis, or further processing.

72,883 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need to accurately extract text from scanned documents, photographs, or other image files in over 100 languages.

Not ideal if you require a graphical user interface (GUI) or expect perfect recognition from poor quality images without prior enhancement.

document-digitization data-entry-automation text-extraction digital-archiving content-conversion

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

72,883

Forks

10,541

Language

C++

License

Apache-2.0

Recent Releases

5.5.2 26 Dec 2025 5.5.1 25 May 2025 5.5.0 10 Nov 2024 5.4.1 11 Jun 2024 5.4.0 06 Jun 2024

Related frameworks

ogkalu2/comic-translate

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a...

naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

mayocream/koharu

ML-powered manga translator, written in Rust.

mindspore-lab/mindocr

A toolbox of ocr models and algorithms based on MindSpore

zyddnys/manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

Explore ML Frameworks

All categories Trending ML Framework directory Insights