pymupdf/langchain-pymupdf4llm

An integration package connecting PyMuPDF4LLM to LangChain

48
/ 100
Emerging

This tool helps convert complex PDF documents into a clean, structured Markdown format. It takes various PDFs—including those with multiple columns, images, and intricate tables—and accurately extracts text, headers, lists, and even image descriptions. This is perfect for data scientists or AI developers working on projects that involve processing PDF content for large language models or retrieval-augmented generation systems.

Use this if you need to reliably convert diverse PDF documents into well-formatted Markdown for use with AI applications like chatbots or knowledge bases.

Not ideal if your primary need is simply to view or print PDFs, or if you require an interactive PDF editing solution.

document-processing AI-data-preparation content-extraction knowledge-management
No Package No Dependents
Maintenance 13 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

17

Forks

3

Language

Python

License

AGPL-3.0

Category

pdf-qa-systems

Last pushed

Mar 23, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/pymupdf/langchain-pymupdf4llm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.