iamarunbrahma/pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

39
/ 100
Emerging

This tool helps you convert complex PDF documents into clean, structured Markdown format. It takes your PDF files as input and outputs a Markdown file, preserving elements like text, tables, images, and code blocks, even with multi-column layouts. It's ideal for anyone who needs to extract detailed information from PDFs for use in advanced text analysis or AI-driven question-answering systems.

115 stars. No commits in the last 6 months.

Use this if you need to transform PDF documents into a structured text format that can be easily processed by AI models for tasks like generating summaries or answering questions.

Not ideal if you need to convert other file types, process extremely large PDFs quickly, or perfectly convert highly specialized mathematical formulas.

document-processing information-extraction content-preparation text-analysis AI-data-preparation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

115

Forks

12

Language

Python

License

MIT

Last pushed

Nov 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/iamarunbrahma/pdf-to-markdown"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.