NanoNets/docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

56
/ 100
Established

This toolkit helps businesses convert various documents like invoices, passports, PDFs, and images into structured markdown or extract specific information. It takes your unstructured documents and produces organized data, ready for analysis or integration, all without needing an internet connection. This is ideal for operations managers, compliance officers, and data entry teams dealing with large volumes of documents.

1,871 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to extract specific details from documents, convert them into a structured markdown format, or benchmark the performance of document processing AI models, all while keeping your data on your own servers.

Not ideal if you need a cloud-based solution or if your primary need is simple text recognition without complex semantic understanding or structured data extraction.

document-processing data-extraction compliance operations-management information-management
Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

1,871

Forks

135

Language

Python

License

Apache-2.0

Last pushed

Aug 25, 2025

Commits (30d)

0

Dependencies

20

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/NanoNets/docext"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.