Document Intelligence Extraction NLP Tools

There are 9 document intelligence extraction tools tracked. 1 score above 50 (established tier). The highest-rated is pd3f/pd3f at 52/100 with 330 stars.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=document-intelligence-extraction&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 pd3f/pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

52
Established
2 kiku-jw/DocStripper

🧹 DocStripper is a lightweight CLI utility that automatically cleans text documents

38
Emerging
3 climate-nlp/reportparse

ReportParse is a unified NLP analyzer for corporate sustainability reports

34
Emerging
4 jwc524/clippy

A smart PDF reader that extracts text and generates headings and summaries...

33
Emerging
5 TheAkshatGupta/Intelligent-Document-Parsing-FinTech

NLP-based system to extract structured information from financial documents

29
Experimental
6 mlemineb/Document-Analyzer-App

A shiny application that analyzes financial documents (pdf format) using NLP...

23
Experimental
7 UnderTheTableHTV7/simplai_HTV7

A website application that uses NLP and Artificial Intelligence to recognize...

22
Experimental
8 stochastic-sisyphus/text-feature-span-extractor

Deterministic invoice extraction using native PDF text layers. No OCR...

19
Experimental
9 ArevikKH/PDF-Summarizer-Multilang-OCR

AI-powered system for summarizing PDF content with Armenian, Russian, and...

10
Experimental