QuivrHQ/MegaParse

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

44
/ 100
Emerging

This tool helps you convert complex documents like PDFs, Word files, and PowerPoints into a clean, comprehensive text format that AI models (LLMs) can easily understand. It takes your existing documents and processes them, ensuring all critical information, including tables and images, is preserved, producing highly accurate text ready for AI analysis or querying. Anyone building applications that use AI to read and interpret business documents, research papers, or reports would find this useful.

7,347 stars. No commits in the last 6 months.

Use this if you need to reliably extract all content from diverse document types (PDFs, Word, PowerPoint) for use with large language models, without losing any critical information like tables or image context.

Not ideal if you only need simple text extraction without concern for preserving complex formatting, tables, or integrating with advanced AI models.

document-processing AI-application-development information-extraction knowledge-management
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

7,347

Forks

416

Language

Python

License

Apache-2.0

Last pushed

Feb 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/QuivrHQ/MegaParse"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.