AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
This project helps professionals teach machines to "read" and understand information from various image and document formats. It takes in images, scanned documents, or web page code and extracts text, recognizes key information, analyzes document layouts, and even generates visual text for different contexts. Data analysts, content managers, and anyone dealing with large volumes of visual data can use this to automate data extraction and document processing.
1,823 stars. No commits in the last 6 months.
Use this if you need to extract structured data from images, scanned documents, or web pages, or generate realistic text within images, making your data more accessible and usable.
Not ideal if your primary need is general-purpose natural language processing on text that is already digitized and structured.
Stars
1,823
Forks
199
Language
C++
License
Apache-2.0
Category
Last pushed
Apr 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AlibabaResearch/AdvancedLiterateMachinery"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ogkalu2/comic-translate
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a...
naptha/tesseract.js
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
mayocream/koharu
ML-powered manga translator, written in Rust.
tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
zyddnys/manga-image-translator
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)