GiftMungmeeprued/document-parsers-list
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.
This project helps you choose the best tool to convert your PDFs into usable text or structured data. It provides a detailed comparison of different document parsing tools, showing what types of content they can accurately extract, like tables, equations, or handwritten notes. Anyone who regularly needs to pull information out of PDF documents for analysis or further processing, such as researchers, data analysts, or legal professionals, would find this useful.
177 stars. No commits in the last 6 months.
Use this if you need to extract specific information from PDFs and want to quickly compare tools based on their ability to handle complex layouts like tables, equations, or multi-column text.
Not ideal if you are looking for a tool to simply view PDFs or if your PDFs contain only basic text without any complex formatting.
Stars
177
Forks
3
Language
—
License
—
Category
Last pushed
Jul 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/GiftMungmeeprued/document-parsers-list"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/langextract
A Python library for extracting structured information from unstructured text using LLMs with...
Extralit/extralit
Fast and accurate systemic data extraction with LLM assistance
Keyvanhardani/german-ocr
German-OCR is specifically trained to extract text from German documents including invoices,...
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
xingbow/SciDaEx
Structured data extraction from research literature