NameetP/pdfmux
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
When you need to turn PDFs into usable text, structured data, or chunks for analysis, this tool helps you automatically get the best possible extraction. It takes any PDF document – digital, scanned, or complex – and provides clean Markdown, JSON, or research-ready text chunks, even auditing its own work and re-extracting pages that weren't perfect the first time. Anyone who regularly needs to pull data from PDFs for reporting, research, or content creation will find this useful.
Available on PyPI.
Use this if you routinely work with diverse PDF documents and need reliable, high-quality text or data extraction without manual oversight or complex configurations.
Not ideal if your PDF extraction needs are extremely simple, like only extracting basic text from purely digital, text-selectable PDFs, where a basic copy-paste or simple library would suffice.
Stars
31
Forks
2
Language
Python
License
MIT
Category
Last pushed
Mar 18, 2026
Commits (30d)
0
Dependencies
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/NameetP/pdfmux"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mindsdb/aipdf
A tool to OCR PDFs using gen-AI models
3Alan/DocsMind
🤖 DocsMind allows you to chat with your docs and summarize your docs, support pdf, md.
zeus-12/uxie
pdf reader app with note taking, annotations, collaboration, ai features (chat, flashcards...
voelspriet/aiwhisperer
DPG Campus Tool. Shrink massive PDFs to fit AI upload limits. Sanitize before uploading to...
anand-mukul/PDFNinja
PDF Ninja is a modern AI-powered PDF SaaS built with Next.js, enabling seamless document...