iamarunbrahma/pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
This tool helps you convert complex PDF documents into clean, structured Markdown format. It takes your PDF files as input and outputs a Markdown file, preserving elements like text, tables, images, and code blocks, even with multi-column layouts. It's ideal for anyone who needs to extract detailed information from PDFs for use in advanced text analysis or AI-driven question-answering systems.
115 stars. No commits in the last 6 months.
Use this if you need to transform PDF documents into a structured text format that can be easily processed by AI models for tasks like generating summaries or answering questions.
Not ideal if you need to convert other file types, process extremely large PDFs quickly, or perfectly convert highly specialized mathematical formulas.
Stars
115
Forks
12
Language
Python
License
MIT
Category
Last pushed
Nov 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/iamarunbrahma/pdf-to-markdown"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
paulpierre/markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file...