Document Intelligence Extraction NLP Tools
There are 9 document intelligence extraction tools tracked. 1 score above 50 (established tier). The highest-rated is pd3f/pd3f at 52/100 with 330 stars.
Get all 9 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=document-intelligence-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
pd3f/pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based |
|
Established |
| 2 |
kiku-jw/DocStripper
🧹 DocStripper is a lightweight CLI utility that automatically cleans text documents |
|
Emerging |
| 3 |
climate-nlp/reportparse
ReportParse is a unified NLP analyzer for corporate sustainability reports |
|
Emerging |
| 4 |
jwc524/clippy
A smart PDF reader that extracts text and generates headings and summaries... |
|
Emerging |
| 5 |
TheAkshatGupta/Intelligent-Document-Parsing-FinTech
NLP-based system to extract structured information from financial documents |
|
Experimental |
| 6 |
mlemineb/Document-Analyzer-App
A shiny application that analyzes financial documents (pdf format) using NLP... |
|
Experimental |
| 7 |
UnderTheTableHTV7/simplai_HTV7
A website application that uses NLP and Artificial Intelligence to recognize... |
|
Experimental |
| 8 |
stochastic-sisyphus/text-feature-span-extractor
Deterministic invoice extraction using native PDF text layers. No OCR... |
|
Experimental |
| 9 |
ArevikKH/PDF-Summarizer-Multilang-OCR
AI-powered system for summarizing PDF content with Armenian, Russian, and... |
|
Experimental |