lakinduboteju/langchain-pymupdf4llm

An integration package connecting PyMuPDF4LLM to LangChain

37
/ 100
Emerging

This project helps you turn complex PDF documents into structured Markdown text, making it easy to use their content with AI tools. You input a PDF file, and it outputs a clean Markdown version, complete with formatted text, tables, and even descriptions of images. This is ideal for anyone who needs to extract information from PDFs for use in large language models or AI-powered content generation.

Available on PyPI.

Use this if you need to reliably convert the content of PDF documents, including complex layouts and images, into a clean Markdown format for AI processing or knowledge bases.

Not ideal if your primary goal is simply viewing PDFs or if you do not plan to integrate the extracted text with AI models or advanced text processing.

document-processing content-extraction knowledge-management AI-data-preparation information-retrieval
Maintenance 6 / 25
Adoption 6 / 25
Maturity 20 / 25
Community 5 / 25

How are scores calculated?

Stars

16

Forks

1

Language

Python

License

MIT

Category

pdf-qa-systems

Last pushed

Nov 23, 2025

Commits (30d)

0

Dependencies

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lakinduboteju/langchain-pymupdf4llm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.