wisupai/e2m

E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.

42
/ 100
Emerging

This tool helps data scientists, AI engineers, and content managers prepare diverse content for advanced AI models. It takes various document types like PDFs, Word files, web pages, and audio recordings, extracts their content, and converts them into structured Markdown format. This process ensures high-quality data is available for training or fine-tuning AI for tasks like retrieval-augmented generation (RAG).

1,276 stars. No commits in the last 6 months.

Use this if you need to standardize and prepare a wide array of unstructured data, including documents, web content, and audio, into a clean Markdown format suitable for AI model training or RAG applications.

Not ideal if you only need to view or edit documents in their original format, or if your primary goal is simple, manual conversion for human readability without AI integration.

data-preparation AI-training content-standardization document-processing RAG-data-pipeline
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

1,276

Forks

72

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Sep 08, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wisupai/e2m"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.