Document Intelligence Extraction ML Frameworks

Tools for extracting, analyzing, and structuring data from documents (PDFs, images, administrative files) using OCR, deep learning, and NLP. Includes document management, parsing, and information retrieval. Does NOT include general document conversion, presentation generation, or book production/typesetting.

There are 79 document intelligence extraction frameworks tracked. 3 score above 50 (established tier). The highest-rated is paperless-ngx/paperless-ngx at 66/100 with 37,318 stars. 1 of the top 10 are actively maintained.

Get all 79 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=document-intelligence-extraction&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index...

66
Established
2 GoogleCloudPlatform/document-ai-samples

Sample applications and demos for Document AI, the end-to-end document...

60
Established
3 aws-solutions/document-understanding-solution

Example of integrating & using Amazon Textract, Amazon Comprehend, Amazon...

50
Established
4 naiveHobo/InvoiceNet

Deep neural network to extract intelligent information from invoice documents.

49
Emerging
5 aphp/edspdf

EDS-PDF is a generic, pure-Python framework for text extraction from PDF...

45
Emerging
6 jonaswinkler/paperless-ng

A supercharged version of paperless: scan, index and archive all your...

44
Emerging
7 jennis0/burdoc

Advanced PDF parsing for python

44
Emerging
8 ptmrio/autorename-pdf

autorename-pdf is a highly efficient tool designed to automatically rename...

44
Emerging
9 vladzima/neuronaming-dev

Open-source: AI powered business names generator. Proof of concept.

42
Emerging
10 StabRise/ScaleDP

ScaleDP is an Open-Source extension of Apache Spark for Document Processing

42
Emerging
11 AkshayG999/MistralOCR---AI-Powered-Document-Extraction

MistralOCR is an open-source application that transforms documents into...

40
Emerging
12 study-assist/browser-extension

A tool to help you organise your bookmarks intelligently

40
Emerging
13 MSUSAzureAccelerators/Intelligent-Document-Processing-Accelerator

Showcase Azure platform’s machine learning capability to recognize document...

39
Emerging
14 Unstructured-IO/community

Open source libraries and APIs to build custom preprocessing pipelines for...

39
Emerging
15 HT0710/Receipt-Information-Extraction

Receipt-Information-Extraction

39
Emerging
16 adhorn/poliko

Demo web applications that use AWS Artificial Intelligence services ...

33
Emerging
17 akshata29/digitalclaims

Microsoft Insurance Claims Automation, powered by AI, handles claim...

33
Emerging
18 FzS92/smart-pdf-highlighter

Automatically identify and highlight key content within PDF files using...

31
Emerging
19 shubhigupta991/PaperTxT

We plan to create an AI which has analytical reading and answering...

31
Emerging
20 aldawsarir/Vortex

AI-powered visual search and document understanding system that transforms...

30
Emerging
21 lucasjvds/Scanipy

Scanipy stands for "scan it with Python"—it's your smart Python library for...

30
Emerging
22 ShaunakSen/AI-for-Web-Accessibility

This is the GitHub repository for my Masters dissertation titled: Artificial...

28
Experimental
23 Uli-Z/autoPDFtagger

autoPDFtagger is a Python tool designed for efficient home-office...

28
Experimental
24 Yashsonaar/LayoutLMv3-Fine-Tuning

Welcome to the LayoutLMv3 Fine-Tuning project! 🚀 This project focuses on...

27
Experimental
25 shawnacontrary24/DocStripper

🧹 Clean up your documents with DocStripper, the AI-powered tool that removes...

27
Experimental
26 ypratap11/invoice-processing-ai

AI-powered invoice processing system using Google Document AI - Automated AP...

27
Experimental
27 butlerlabs/docai

DocAI helps developers quickly build document, image and text processing...

26
Experimental
28 hrushikesh009/TensorFlow-OCR-Invoice-Extractor

A TensorFlow OCR solution,Leveraging advanced object detection models like...

25
Experimental
29 dev-luckymhz/AIVisionText-invoice-OCR-typescript

AIVisionText is an advanced document analysis platform that harnesses the...

25
Experimental
30 halilxibrahim/ai-logo-generator-webapp

AI Logo generator Web App

23
Experimental
31 danielbusnz-lgtm/inkvault

AI-powered document processing pipeline with Claude, FastAPI, and AWS

23
Experimental
32 shreastharaj/PasteClip

Manage and access your macOS clipboard history with PasteClip, a lightweight...

22
Experimental
33 MukundaKatta/SketchFlow

Wireframe-to-code converter — generate HTML/CSS from structured component...

22
Experimental
34 Bharathyalagi/OCR-Document-parser

Smart OCR application built with Tesseract and Streamlit that extracts...

22
Experimental
35 JuanCS-Dev/typecraft

AI-Powered Book Production Engine - Transform manuscripts into...

22
Experimental
36 Outofplace-tobacconist674/deeplens

Analyze EVM blockchain data on-chain to provide clear intelligence and...

22
Experimental
37 MukundaKatta/ClipBoard

Clipboard history manager — smart snippets with search, tagging, content...

22
Experimental
38 rogue-agent1/htmlstrip

htmlstrip - Strip HTML tags and extract text content

22
Experimental
39 Biellgrimm/itbaa

📄 Convert HTML to high-quality PDF with Itbaa, supporting vector output,...

22
Experimental
40 Mato989086/AI-INVOICE-OCR-ENGINE

🤖 Streamline invoice processing with this AI-powered OCR engine for accurate...

22
Experimental
41 sangpham06112004/ScanForge

🛠️ Simplify and automate code scanning to enhance security and streamline...

22
Experimental
42 Bilal-03/invoice-extraction

AI-powered invoice data extraction using Computer Vision and NLP. Automates...

22
Experimental
43 cstroie/DocMindAI

A comprehensive PHP-based AI toolkit for intelligent document processing and...

21
Experimental
44 Deathfrosthacker/Accessibility-Text-Enhancer

✨ Enhance web accessibility in real-time with this browser extension that...

21
Experimental
45 itssharmaXD/numbers-le

🔢 Extract numbers swiftly from JSON, YAML, CSV, TOML, INI, and ENV files at...

21
Experimental
46 NhanPhamThanh-IT/Scan-PDF-Paper

Advanced document analysis platform that extracts text from PDF, DOCX, and...

21
Experimental
47 conditionedstimulus/DocumentClassifier

FastAPI application for document classification using a multimodal LayoutLM...

21
Experimental
48 reisel-g/doc2dataset

📄 Ingest documents into structured datasets for LLMs, ensuring numeric...

21
Experimental
49 Aid-On/templex

Template Extractor - Extract abstract templates and document structures from...

21
Experimental
50 amr122deqw/google-form-history

📝 Track your Google Form responses easily with this Chrome extension,...

21
Experimental
51 6825972/a11y-tw-audit-skill

Audit Taiwan websites for accessibility issues using WCAG 2.2 AA and local...

21
Experimental
52 sjvrensburg/railreader2

Desktop PDF viewer optimised for high magnification viewing.

21
Experimental
53 graceytl/ai-receipt-data-extraction

AI & ML research project for automatic product extraction, classification,...

21
Experimental
54 onify/blueprint-aws-textract-pdf-to-form

Onify Blueprint: Amazon AWS Textract - PDF to form example

20
Experimental
55 ICan-js/ICan.js

Biblioteca para adição de mais acessibilidade em páginas da web através de...

19
Experimental
56 ChanMeng666/emoji-story-generator

【Sprinkle some star dust on this repo!⭐️】An interactive web application that...

18
Experimental
57 man2k/AI-PDFReader

AI PDF Reader

16
Experimental
58 sarawagh27/smart-ai-file-organizer

AI-powered file organizer that automatically classifies and moves PDF, DOCX,...

14
Experimental
59 Komorebirumu/awe-ms-20260326-1002-00

AI Personalized Children's Stories & Images

14
Experimental
60 ruban-ai/deep-learning-accessibility-audit

Deep learning system for automated accessibility analysis of digital content...

14
Experimental
61 FajarSangTrader/text-feature-span-extractor

📄 Extract features from invoices using a robust text-layer span extractor....

14
Experimental
62 anuja024/AI-ddr-report-generator

AI-powered system that generates automated Defect Detection Reports (DDR)...

14
Experimental
63 Zakwani123/rihal-docfusion

Receipt extraction and anomaly detection pipeline — OCR, Random Forest, Streamlit UI

14
Experimental
64 Garendra/qwen3-2b-ocr-app

🖼️ Extract text from PDF documents using Qwen3-2B-VL with a Docker setup and...

14
Experimental
65 Stravinskyopticalglass907/papertrail

Extract key insights from PDFs page by page with AI-powered summaries and...

14
Experimental
66 angelpro17/Media-AI-Processor

Media-AI-Processor is a scalable media processing engine built with FastAPI...

13
Experimental
67 Avielzi/ScanMaster-AI

ScanMaster AI

13
Experimental
68 Traviseric/parallel-book-generation

Parallel AI Book Generation Architecture - Generate complete books in under...

13
Experimental
69 THILLAINATARAJAN-B/Maptizer

Geo-AI platform for real-time location intelligence, business viability, and...

13
Experimental
70 BluShooz/nauknauk-clone

AI-powered platform that transforms toy/figure photos into animated videos....

13
Experimental
71 Muhib-Hasan/invoice-processor

📄 Process Vietnam e-invoices seamlessly with multi-format support and a...

13
Experimental
72 ayamekni/AdminDoc-X

AdminDoc-X is an AI-powered document intelligence platform for extracting,...

12
Experimental
73 abishekmuthian/dsc-automation

Automate disability support committee in universities.

12
Experimental
74 shubham001official/lenscorp

🚀 Cutting-edge AI services for businesses: biometrics 👤, image analysis 📸,...

11
Experimental
75 mariogarcia-ar/Document-Digitization

Document Digitization

11
Experimental
76 theoliverlear/AI-Baby-Name-Generator

A service using artificial intelligence to provide the world with a new era of names.

11
Experimental
77 rahmansahinler1/vector_similarity_search

An intuitive work for vector similarity search with Faiss on sentence embeddings.

11
Experimental
78 IbrahimBaAta/bioLens

Deep Learning enthusiast passionate about computer vision and AI-driven...

11
Experimental
79 Specii/Genetexium

Genetexium API empowers developers with AI-driven tools for content...

11
Experimental

Comparisons in this category