luisleo526/doc2mark

AI-powered Python library that converts any document (PDF, Word, Excel, PowerPoint, HTML) to clean Markdown while preserving complex tables and layouts using AI-Powered OCR technology.

50
/ 100
Established

This tool helps professionals convert various document types like PDFs, Word files, Excel spreadsheets, or even scanned images into clean Markdown text. You feed it a document, and it outputs a well-structured Markdown version, accurately preserving complex tables and layouts, even from scanned pages. It's ideal for content managers, researchers, or anyone needing to transform diverse documents into a consistent, editable text format.

Used by 1 other package. Available on PyPI.

Use this if you need to extract content from a wide range of documents, including those with complex tables or scanned text, and convert it into a clean, easy-to-use Markdown format.

Not ideal if your primary goal is simply viewing documents, or if you need to retain the original document's exact visual formatting and interactive elements rather than just its content structure.

document-conversion content-extraction data-preparation research-assist knowledge-management
Maintenance 10 / 25
Adoption 9 / 25
Maturity 24 / 25
Community 7 / 25

How are scores calculated?

Stars

47

Forks

3

Language

Python

License

MIT

Last pushed

Mar 04, 2026

Commits (30d)

0

Dependencies

11

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/luisleo526/doc2mark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.