luisleo526/doc2mark
AI-powered Python library that converts any document (PDF, Word, Excel, PowerPoint, HTML) to clean Markdown while preserving complex tables and layouts using AI-Powered OCR technology.
This tool helps professionals convert various document types like PDFs, Word files, Excel spreadsheets, or even scanned images into clean Markdown text. You feed it a document, and it outputs a well-structured Markdown version, accurately preserving complex tables and layouts, even from scanned pages. It's ideal for content managers, researchers, or anyone needing to transform diverse documents into a consistent, editable text format.
Used by 1 other package. Available on PyPI.
Use this if you need to extract content from a wide range of documents, including those with complex tables or scanned text, and convert it into a clean, easy-to-use Markdown format.
Not ideal if your primary goal is simply viewing documents, or if you need to retain the original document's exact visual formatting and interactive elements rather than just its content structure.
Stars
47
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 04, 2026
Commits (30d)
0
Dependencies
11
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/luisleo526/doc2mark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
paulpierre/markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file...