AlibabaResearch/AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

46
/ 100
Emerging

This project helps professionals teach machines to "read" and understand information from various image and document formats. It takes in images, scanned documents, or web page code and extracts text, recognizes key information, analyzes document layouts, and even generates visual text for different contexts. Data analysts, content managers, and anyone dealing with large volumes of visual data can use this to automate data extraction and document processing.

1,823 stars. No commits in the last 6 months.

Use this if you need to extract structured data from images, scanned documents, or web pages, or generate realistic text within images, making your data more accessible and usable.

Not ideal if your primary need is general-purpose natural language processing on text that is already digitized and structured.

document-processing data-extraction content-management digital-transformation web-scraping
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

1,823

Forks

199

Language

C++

License

Apache-2.0

Category

latex-ocr-tools

Last pushed

Apr 09, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AlibabaResearch/AdvancedLiterateMachinery"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.