NameetP/pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

46
/ 100
Emerging

When you need to turn PDFs into usable text, structured data, or chunks for analysis, this tool helps you automatically get the best possible extraction. It takes any PDF document – digital, scanned, or complex – and provides clean Markdown, JSON, or research-ready text chunks, even auditing its own work and re-extracting pages that weren't perfect the first time. Anyone who regularly needs to pull data from PDFs for reporting, research, or content creation will find this useful.

Available on PyPI.

Use this if you routinely work with diverse PDF documents and need reliable, high-quality text or data extraction without manual oversight or complex configurations.

Not ideal if your PDF extraction needs are extremely simple, like only extracting basic text from purely digital, text-selectable PDFs, where a basic copy-paste or simple library would suffice.

document-analysis data-extraction research-automation content-preparation information-retrieval
Maintenance 13 / 25
Adoption 7 / 25
Maturity 20 / 25
Community 6 / 25

How are scores calculated?

Stars

31

Forks

2

Language

Python

License

MIT

Category

ai-pdf-saas

Last pushed

Mar 18, 2026

Commits (30d)

0

Dependencies

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/NameetP/pdfmux"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.