Document Data Extraction NLP Tools
There are 16 document data extraction tools tracked. 3 score above 50 (established tier). The highest-rated is google/langextract at 69/100 with 34,668 stars. 1 of the top 10 are actively maintained.
Get all 16 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=document-data-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
google/langextract
A Python library for extracting structured information from unstructured... |
|
Established |
| 2 |
Extralit/extralit
Fast and accurate systemic data extraction with LLM assistance |
|
Established |
| 3 |
Keyvanhardani/german-ocr
German-OCR is specifically trained to extract text from German documents... |
|
Established |
| 4 |
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction |
|
Emerging |
| 5 |
xingbow/SciDaEx
Structured data extraction from research literature |
|
Emerging |
| 6 |
parsee-ai/parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models.... |
|
Emerging |
| 7 |
davendw49/sciparser
PDF parsing toolkit for preparing academic text corpus |
|
Emerging |
| 8 |
yaminivibha/LLM_InformationRetrieval
extracting "structured" information that is embedded in natural language... |
|
Experimental |
| 9 |
Danitilahun/Document-processing-Pdf-Structured-Data-Extractor
This project demonstrates how to extract structured information from PDF... |
|
Experimental |
| 10 |
GiftMungmeeprued/document-parsers-list
A comprehensive list of document parsers, covering PDF-to-text conversion... |
|
Experimental |
| 11 |
ycastorium/lextract
LLM-powered text extraction library for Elixir |
|
Experimental |
| 12 |
JannesKlaas/doxstractor
Extract structured data from document in a modular way using NLP and LLMs. |
|
Experimental |
| 13 |
awalz92/schema-extract-deke
Schema-driven structured data extraction from unstructured text using local... |
|
Experimental |
| 14 |
kninepro09/intelligent-document-understanding
📄 Analyze unstructured documents with an end-to-end NLP system for... |
|
Experimental |
| 15 |
morikaglobal/langextract
Experimenting langextract for possible use case |
|
Experimental |
| 16 |
VianneyMI/amplifai
Amplifai is a package that allows you to transform your raw unstructured... |
|
Experimental |