mindsdb/aipdf
A tool to OCR PDFs using gen-AI models
This tool helps developers transform PDF documents into structured data like Markdown, or extract specific information such as tables or chart data in JSON format. It processes PDFs as file objects, allowing input from various sources like local files, URLs, or S3 buckets, and outputs the extracted content page by page. It's designed for software engineers or data engineers who need to automate complex data extraction from many PDFs.
Used by 1 other package. Available on PyPI.
Use this if you are a developer building applications that need to programmatically extract detailed information from diverse PDF documents, especially those with complex layouts or scanned content, using AI models.
Not ideal if you need a no-code solution or a graphical user interface for manual PDF data extraction, or if you don't have programming experience.
Stars
46
Forks
6
Language
Python
License
MIT
Category
Last pushed
Dec 22, 2025
Commits (30d)
0
Dependencies
2
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mindsdb/aipdf"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
3Alan/DocsMind
🤖 DocsMind allows you to chat with your docs and summarize your docs, support pdf, md.
NameetP/pdfmux
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
zeus-12/uxie
pdf reader app with note taking, annotations, collaboration, ai features (chat, flashcards...
voelspriet/aiwhisperer
DPG Campus Tool. Shrink massive PDFs to fit AI upload limits. Sanitize before uploading to...
anand-mukul/PDFNinja
PDF Ninja is a modern AI-powered PDF SaaS built with Next.js, enabling seamless document...