AlibabaResearch/AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

/ 100

Emerging

This project helps professionals teach machines to "read" and understand information from various image and document formats. It takes in images, scanned documents, or web page code and extracts text, recognizes key information, analyzes document layouts, and even generates visual text for different contexts. Data analysts, content managers, and anyone dealing with large volumes of visual data can use this to automate data extraction and document processing.

1,823 stars. No commits in the last 6 months.

Use this if you need to extract structured data from images, scanned documents, or web pages, or generate realistic text within images, making your data more accessible and usable.

Not ideal if your primary need is general-purpose natural language processing on text that is already digitized and structured.

document-processing data-extraction content-management digital-transformation web-scraping

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

1,823

Forks

199

Language

C++

License

Apache-2.0

Higher-rated alternatives

ogkalu2/comic-translate

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a...

naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

mayocream/koharu

ML-powered manga translator, written in Rust.

tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

zyddnys/manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

Explore ML Frameworks

All categories Trending ML Framework directory Insights