tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
Tesseract OCR helps you convert scanned documents, images, or PDFs containing text into editable and searchable digital text. You provide an image file, and it outputs the text in various formats like plain text or searchable PDF. This tool is ideal for anyone who needs to extract text from images for archiving, analysis, or further processing.
72,883 stars. Actively maintained with 1 commit in the last 30 days.
Use this if you need to accurately extract text from scanned documents, photographs, or other image files in over 100 languages.
Not ideal if you require a graphical user interface (GUI) or expect perfect recognition from poor quality images without prior enhancement.
Stars
72,883
Forks
10,541
Language
C++
License
Apache-2.0
Category
Last pushed
Feb 28, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/tesseract-ocr/tesseract"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related frameworks
ogkalu2/comic-translate
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a...
naptha/tesseract.js
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
mayocream/koharu
ML-powered manga translator, written in Rust.
mindspore-lab/mindocr
A toolbox of ocr models and algorithms based on MindSpore
zyddnys/manga-image-translator
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)