mindsdb/aipdf

A tool to OCR PDFs using gen-AI models

53
/ 100
Established

This tool helps developers transform PDF documents into structured data like Markdown, or extract specific information such as tables or chart data in JSON format. It processes PDFs as file objects, allowing input from various sources like local files, URLs, or S3 buckets, and outputs the extracted content page by page. It's designed for software engineers or data engineers who need to automate complex data extraction from many PDFs.

Used by 1 other package. Available on PyPI.

Use this if you are a developer building applications that need to programmatically extract detailed information from diverse PDF documents, especially those with complex layouts or scanned content, using AI models.

Not ideal if you need a no-code solution or a graphical user interface for manual PDF data extraction, or if you don't have programming experience.

document-processing data-extraction pdf-automation content-conversion ai-powered-data-capture
Maintenance 6 / 25
Adoption 9 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

46

Forks

6

Language

Python

License

MIT

Category

ai-pdf-saas

Last pushed

Dec 22, 2025

Commits (30d)

0

Dependencies

2

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mindsdb/aipdf"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.