Document Intelligence Extraction ML Frameworks
Tools for extracting, analyzing, and structuring data from documents (PDFs, images, administrative files) using OCR, deep learning, and NLP. Includes document management, parsing, and information retrieval. Does NOT include general document conversion, presentation generation, or book production/typesetting.
There are 79 document intelligence extraction frameworks tracked. 3 score above 50 (established tier). The highest-rated is paperless-ngx/paperless-ngx at 66/100 with 37,318 stars. 1 of the top 10 are actively maintained.
Get all 79 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=document-intelligence-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index... |
|
Established |
| 2 |
GoogleCloudPlatform/document-ai-samples
Sample applications and demos for Document AI, the end-to-end document... |
|
Established |
| 3 |
aws-solutions/document-understanding-solution
Example of integrating & using Amazon Textract, Amazon Comprehend, Amazon... |
|
Established |
| 4 |
naiveHobo/InvoiceNet
Deep neural network to extract intelligent information from invoice documents. |
|
Emerging |
| 5 |
aphp/edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF... |
|
Emerging |
| 6 |
jonaswinkler/paperless-ng
A supercharged version of paperless: scan, index and archive all your... |
|
Emerging |
| 7 |
jennis0/burdoc
Advanced PDF parsing for python |
|
Emerging |
| 8 |
ptmrio/autorename-pdf
autorename-pdf is a highly efficient tool designed to automatically rename... |
|
Emerging |
| 9 |
vladzima/neuronaming-dev
Open-source: AI powered business names generator. Proof of concept. |
|
Emerging |
| 10 |
StabRise/ScaleDP
ScaleDP is an Open-Source extension of Apache Spark for Document Processing |
|
Emerging |
| 11 |
AkshayG999/MistralOCR---AI-Powered-Document-Extraction
MistralOCR is an open-source application that transforms documents into... |
|
Emerging |
| 12 |
study-assist/browser-extension
A tool to help you organise your bookmarks intelligently |
|
Emerging |
| 13 |
MSUSAzureAccelerators/Intelligent-Document-Processing-Accelerator
Showcase Azure platform’s machine learning capability to recognize document... |
|
Emerging |
| 14 |
Unstructured-IO/community
Open source libraries and APIs to build custom preprocessing pipelines for... |
|
Emerging |
| 15 |
HT0710/Receipt-Information-Extraction
Receipt-Information-Extraction |
|
Emerging |
| 16 |
adhorn/poliko
Demo web applications that use AWS Artificial Intelligence services ... |
|
Emerging |
| 17 |
akshata29/digitalclaims
Microsoft Insurance Claims Automation, powered by AI, handles claim... |
|
Emerging |
| 18 |
FzS92/smart-pdf-highlighter
Automatically identify and highlight key content within PDF files using... |
|
Emerging |
| 19 |
shubhigupta991/PaperTxT
We plan to create an AI which has analytical reading and answering... |
|
Emerging |
| 20 |
aldawsarir/Vortex
AI-powered visual search and document understanding system that transforms... |
|
Emerging |
| 21 |
lucasjvds/Scanipy
Scanipy stands for "scan it with Python"—it's your smart Python library for... |
|
Emerging |
| 22 |
ShaunakSen/AI-for-Web-Accessibility
This is the GitHub repository for my Masters dissertation titled: Artificial... |
|
Experimental |
| 23 |
Uli-Z/autoPDFtagger
autoPDFtagger is a Python tool designed for efficient home-office... |
|
Experimental |
| 24 |
Yashsonaar/LayoutLMv3-Fine-Tuning
Welcome to the LayoutLMv3 Fine-Tuning project! 🚀 This project focuses on... |
|
Experimental |
| 25 |
shawnacontrary24/DocStripper
🧹 Clean up your documents with DocStripper, the AI-powered tool that removes... |
|
Experimental |
| 26 |
ypratap11/invoice-processing-ai
AI-powered invoice processing system using Google Document AI - Automated AP... |
|
Experimental |
| 27 |
butlerlabs/docai
DocAI helps developers quickly build document, image and text processing... |
|
Experimental |
| 28 |
hrushikesh009/TensorFlow-OCR-Invoice-Extractor
A TensorFlow OCR solution,Leveraging advanced object detection models like... |
|
Experimental |
| 29 |
dev-luckymhz/AIVisionText-invoice-OCR-typescript
AIVisionText is an advanced document analysis platform that harnesses the... |
|
Experimental |
| 30 |
halilxibrahim/ai-logo-generator-webapp
AI Logo generator Web App |
|
Experimental |
| 31 |
danielbusnz-lgtm/inkvault
AI-powered document processing pipeline with Claude, FastAPI, and AWS |
|
Experimental |
| 32 |
shreastharaj/PasteClip
Manage and access your macOS clipboard history with PasteClip, a lightweight... |
|
Experimental |
| 33 |
MukundaKatta/SketchFlow
Wireframe-to-code converter — generate HTML/CSS from structured component... |
|
Experimental |
| 34 |
Bharathyalagi/OCR-Document-parser
Smart OCR application built with Tesseract and Streamlit that extracts... |
|
Experimental |
| 35 |
JuanCS-Dev/typecraft
AI-Powered Book Production Engine - Transform manuscripts into... |
|
Experimental |
| 36 |
Outofplace-tobacconist674/deeplens
Analyze EVM blockchain data on-chain to provide clear intelligence and... |
|
Experimental |
| 37 |
MukundaKatta/ClipBoard
Clipboard history manager — smart snippets with search, tagging, content... |
|
Experimental |
| 38 |
rogue-agent1/htmlstrip
htmlstrip - Strip HTML tags and extract text content |
|
Experimental |
| 39 |
Biellgrimm/itbaa
📄 Convert HTML to high-quality PDF with Itbaa, supporting vector output,... |
|
Experimental |
| 40 |
Mato989086/AI-INVOICE-OCR-ENGINE
🤖 Streamline invoice processing with this AI-powered OCR engine for accurate... |
|
Experimental |
| 41 |
sangpham06112004/ScanForge
🛠️ Simplify and automate code scanning to enhance security and streamline... |
|
Experimental |
| 42 |
Bilal-03/invoice-extraction
AI-powered invoice data extraction using Computer Vision and NLP. Automates... |
|
Experimental |
| 43 |
cstroie/DocMindAI
A comprehensive PHP-based AI toolkit for intelligent document processing and... |
|
Experimental |
| 44 |
Deathfrosthacker/Accessibility-Text-Enhancer
✨ Enhance web accessibility in real-time with this browser extension that... |
|
Experimental |
| 45 |
itssharmaXD/numbers-le
🔢 Extract numbers swiftly from JSON, YAML, CSV, TOML, INI, and ENV files at... |
|
Experimental |
| 46 |
NhanPhamThanh-IT/Scan-PDF-Paper
Advanced document analysis platform that extracts text from PDF, DOCX, and... |
|
Experimental |
| 47 |
conditionedstimulus/DocumentClassifier
FastAPI application for document classification using a multimodal LayoutLM... |
|
Experimental |
| 48 |
reisel-g/doc2dataset
📄 Ingest documents into structured datasets for LLMs, ensuring numeric... |
|
Experimental |
| 49 |
Aid-On/templex
Template Extractor - Extract abstract templates and document structures from... |
|
Experimental |
| 50 |
amr122deqw/google-form-history
📝 Track your Google Form responses easily with this Chrome extension,... |
|
Experimental |
| 51 |
6825972/a11y-tw-audit-skill
Audit Taiwan websites for accessibility issues using WCAG 2.2 AA and local... |
|
Experimental |
| 52 |
sjvrensburg/railreader2
Desktop PDF viewer optimised for high magnification viewing. |
|
Experimental |
| 53 |
graceytl/ai-receipt-data-extraction
AI & ML research project for automatic product extraction, classification,... |
|
Experimental |
| 54 |
onify/blueprint-aws-textract-pdf-to-form
Onify Blueprint: Amazon AWS Textract - PDF to form example |
|
Experimental |
| 55 |
ICan-js/ICan.js
Biblioteca para adição de mais acessibilidade em páginas da web através de... |
|
Experimental |
| 56 |
ChanMeng666/emoji-story-generator
【Sprinkle some star dust on this repo!⭐️】An interactive web application that... |
|
Experimental |
| 57 |
man2k/AI-PDFReader
AI PDF Reader |
|
Experimental |
| 58 |
sarawagh27/smart-ai-file-organizer
AI-powered file organizer that automatically classifies and moves PDF, DOCX,... |
|
Experimental |
| 59 |
Komorebirumu/awe-ms-20260326-1002-00
AI Personalized Children's Stories & Images |
|
Experimental |
| 60 |
ruban-ai/deep-learning-accessibility-audit
Deep learning system for automated accessibility analysis of digital content... |
|
Experimental |
| 61 |
FajarSangTrader/text-feature-span-extractor
📄 Extract features from invoices using a robust text-layer span extractor.... |
|
Experimental |
| 62 |
anuja024/AI-ddr-report-generator
AI-powered system that generates automated Defect Detection Reports (DDR)... |
|
Experimental |
| 63 |
Zakwani123/rihal-docfusion
Receipt extraction and anomaly detection pipeline — OCR, Random Forest, Streamlit UI |
|
Experimental |
| 64 |
Garendra/qwen3-2b-ocr-app
🖼️ Extract text from PDF documents using Qwen3-2B-VL with a Docker setup and... |
|
Experimental |
| 65 |
Stravinskyopticalglass907/papertrail
Extract key insights from PDFs page by page with AI-powered summaries and... |
|
Experimental |
| 66 |
angelpro17/Media-AI-Processor
Media-AI-Processor is a scalable media processing engine built with FastAPI... |
|
Experimental |
| 67 |
Avielzi/ScanMaster-AI
ScanMaster AI |
|
Experimental |
| 68 |
Traviseric/parallel-book-generation
Parallel AI Book Generation Architecture - Generate complete books in under... |
|
Experimental |
| 69 |
THILLAINATARAJAN-B/Maptizer
Geo-AI platform for real-time location intelligence, business viability, and... |
|
Experimental |
| 70 |
BluShooz/nauknauk-clone
AI-powered platform that transforms toy/figure photos into animated videos.... |
|
Experimental |
| 71 |
Muhib-Hasan/invoice-processor
📄 Process Vietnam e-invoices seamlessly with multi-format support and a... |
|
Experimental |
| 72 |
ayamekni/AdminDoc-X
AdminDoc-X is an AI-powered document intelligence platform for extracting,... |
|
Experimental |
| 73 |
abishekmuthian/dsc-automation
Automate disability support committee in universities. |
|
Experimental |
| 74 |
shubham001official/lenscorp
🚀 Cutting-edge AI services for businesses: biometrics 👤, image analysis 📸,... |
|
Experimental |
| 75 |
mariogarcia-ar/Document-Digitization
Document Digitization |
|
Experimental |
| 76 |
theoliverlear/AI-Baby-Name-Generator
A service using artificial intelligence to provide the world with a new era of names. |
|
Experimental |
| 77 |
rahmansahinler1/vector_similarity_search
An intuitive work for vector similarity search with Faiss on sentence embeddings. |
|
Experimental |
| 78 |
IbrahimBaAta/bioLens
Deep Learning enthusiast passionate about computer vision and AI-driven... |
|
Experimental |
| 79 |
Specii/Genetexium
Genetexium API empowers developers with AI-driven tools for content... |
|
Experimental |