PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

79
/ 100
Verified

This tool helps you convert any image or PDF document into structured data like Markdown or JSON. It accurately extracts text and layout information from even challenging documents, making it ready for use in advanced AI applications. Marketing analysts, operations managers, and data entry specialists can use this to automate data extraction from various documents.

72,167 stars. Used by 10 other packages. Actively maintained with 12 commits in the last 30 days. Available on PyPI.

Use this if you need to reliably extract text and structural information from documents, especially those that are scanned, warped, or photographed, and want to use that data for AI applications.

Not ideal if you only need basic text copying and pasting, or if your documents are already in a perfectly editable digital format.

document-processing data-extraction workflow-automation content-digitization information-retrieval
Maintenance 17 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

72,167

Forks

9,954

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

12

Dependencies

4

Reverse dependents

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/PaddlePaddle/PaddleOCR"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.