microsoft/markitdown

Python tool for converting files and office documents to Markdown.

71
/ 100
Verified

MarkItDown helps data scientists, researchers, and AI developers prepare various document types for Large Language Models (LLMs). It takes common formats like PDFs, Word documents, PowerPoint presentations, or even YouTube URLs, and converts them into structured Markdown text. The output preserves key structural elements like headings and tables, making it ideal for text analysis pipelines and LLM ingestion.

90,677 stars. Used by 28 other packages. Actively maintained with 2 commits in the last 30 days. Available on PyPI.

Use this if you need to convert a wide range of file types into a structured, LLM-friendly Markdown format for text analysis or AI model input.

Not ideal if you need high-fidelity document conversions for human consumption where original formatting and visual layout are critical.

data-preparation LLM-ingestion document-processing text-extraction AI-pipeline
Maintenance 13 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 18 / 25

How are scores calculated?

Stars

90,677

Forks

5,354

Language

Python

License

MIT

Last pushed

Mar 10, 2026

Commits (30d)

2

Dependencies

6

Reverse dependents

28

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/microsoft/markitdown"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.