GiftMungmeeprued/document-parsers-list

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts.

/ 100

Experimental

This project helps you choose the best tool to convert your PDFs into usable text or structured data. It provides a detailed comparison of different document parsing tools, showing what types of content they can accurately extract, like tables, equations, or handwritten notes. Anyone who regularly needs to pull information out of PDF documents for analysis or further processing, such as researchers, data analysts, or legal professionals, would find this useful.

177 stars. No commits in the last 6 months.

Use this if you need to extract specific information from PDFs and want to quickly compare tools based on their ability to handle complex layouts like tables, equations, or multi-column text.

Not ideal if you are looking for a tool to simply view PDFs or if your PDFs contain only basic text without any complex formatting.

document-processing data-extraction research-data-capture information-retrieval content-digitization

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 7 / 25

Community 5 / 25

How are scores calculated?

Stars

177

Forks

Language

—

License

—

Higher-rated alternatives

google/langextract

A Python library for extracting structured information from unstructured text using LLMs with...

Extralit/extralit

Fast and accurate systemic data extraction with LLM assistance

Keyvanhardani/german-ocr

German-OCR is specifically trained to extract text from German documents including invoices,...

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

xingbow/SciDaEx

Structured data extraction from research literature

Explore NLP Tools

All categories Trending NLP directory Insights