Document Data Extraction Generative AI Tools

Tools for extracting, parsing, and structuring data from documents (PDFs, images, business cards, invoices, tenders) using OCR and AI. Includes document intelligence, tabular data extraction, and field recognition. Does NOT include document summarization, general document Q&A without structured extraction, or legal/thematic document analysis.

There are 42 document data extraction tools tracked. The highest-rated is gmp007/PropertyExtractor at 38/100 with 13 stars.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=generative-ai&subcategory=document-data-extraction&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 gmp007/PropertyExtractor

Generative AI-based Software for Material Property and Database Generation

38
Emerging
2 john-ng-hk/Biz-card-scanner

A digital repository for your physical business cards

34
Emerging
3 AdritPal08/universal-web-scraper-using-generative-ai

Effortless Data Extraction, Powered by : Generative AI

34
Emerging
4 jWinman91/AI-OCR

An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool

32
Emerging
5 ryanmcdonough/lexplore

Tool to allow extraction of data from legal documents

32
Emerging
6 jWinman91/AI-OCR-Frontend

An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool (frontend)

31
Emerging
7 thehackersplaybook/thp-ocr

THP-OCR: A simple Gen AI-powered OCR tool. 🍁

30
Emerging
8 100ravSingh/ChequeScan

My Gen AI deployment

29
Experimental
9 kaifcoder/Invoice-Query-Tool-using-gemini-ai

This repository contains a Python project that leverages the Gemini Pro...

28
Experimental
10 codedbyasim/Generative-AI-Document-Intelligence-System

Extract and summarise data from PDFs and images using OCR + LLMs. Built with...

22
Experimental
11 viochris/Streamlit-SpendSense

💸 SpendSense: An AI-powered personal finance tracker built with Streamlit....

21
Experimental
12 bejranonda/MeterVision

👁️ MeterVision: Enterprise-grade meter infrastructure management with a...

21
Experimental
13 law4percent/CheckMe

CheckMe eliminates manual paper checking by using a flatbed scanner,...

21
Experimental
14 Akhand-Pratap-Tiwari/Cyber-Alertz-web-scrapping-microservice

Flask app for scraping cybersecurity website and purify the raw content...

19
Experimental
15 artyuan/smart-receipt-assistant

Reads market invoices to extract and analyze spending data. Tracks prices of...

19
Experimental
16 kmaurinjones/Housing-Law-Insight

Web application designed to showcase the potential of Data Science and...

18
Experimental
17 MasterChief-ai/AI-Dataset-Analysis-Tool

An AI-powered dataset analysis tool that automatically classifies tasks...

18
Experimental
18 dvp-git/gemini-information-extractor

A simple single interface information extractor app using the latest...

17
Experimental
19 jagratadeb/GenAI-UiPath-TextExtractor

UiPath automation using OCR and GenAI to extract key data from scanned...

17
Experimental
20 codeterrayt/Scalable-Genai-Invoice-PDF-Data-Extractor

Scalable GenAI-powered system to extract structured invoice data from PDFs &...

17
Experimental
21 Wilson0406/Self-Improving-LLM-Agent

A dual-agent, feedback-driven document extraction system using GPT-5 and...

15
Experimental
22 Naresh1401/Intelligent-document-processing

LLM-powered document processing: extract structured data from invoices,...

14
Experimental
23 Anthtrax/AIcheck

📸 Streamline your study process with AIcheck, a quick job-checking tool that...

14
Experimental
24 Magenta91/test101

A web application that extracts text from PDF files, processes it using...

14
Experimental
25 Suriya-Prakashar/AI-driven-tender-scrutiny-system-for-NLCI

AI-powered system for NLC India Limited to automate tender scrutiny. Uses...

13
Experimental
26 Chaitanyakrishna294/Myntra_Genai

myntra reveiw analysis using genai

13
Experimental
27 Phoenixcoder-6/po-automation

This project automates the extraction, parsing, and structuring of purchase...

13
Experimental
28 0ameyasr/DocVal-Mini

Insurance Document Validation with Gemini AI + FastAPI

13
Experimental
29 Debjyoti2004/PhotoCheck-AI

An intelligent web application that instantly verifies if a passport photo...

13
Experimental
30 het953/AI-Web-Scraper

An intelligent web scraping tool built with Streamlit, Selenium, and...

13
Experimental
31 dhcgn/anthropic-paperless-ngx-ocr

AnthropicPaperOCR is a CLI tool that extracts text from PDFs using advanced...

12
Experimental
32 vedant-kalal/AI-Visiting-Card-Extractor

An AI-powered tool that instantly converts business cards into actionable...

12
Experimental
33 ReNothingg/WBcheker

Проект для анализа отзывов на товары с Wildberries с использованием Gemini...

12
Experimental
34 CyranoB/claim_analysis

This project provides a tool to analyze claims made in a webpage or...

11
Experimental
35 RajhansJain/MULTI-LANGUAGE-INVOICE-EXTRACTOR-LLM

AI-powered invoice understanding system using Vision + LLMs (Gemini API)....

11
Experimental
36 etrotta/gemini_easy_extractor

Automatically extract formatted data out of text documents

11
Experimental
37 Siva-Dev-001/Invoice-Pro-using-LLM

A multi-language invoice extractor using Streamlit and LLM

11
Experimental
38 usrtem/ResearchAI

AI-powered document analysis tool for querying content across PDFs, Word...

11
Experimental
39 MITTALBHAVYA/InvoiceDetailsExtractor

Invoice Extraction Application is a Python-based tool built with Streamlit...

11
Experimental
40 TheOwner-glitch/oracle_hcm_metadata_extractor

Python-based command-line tool that extracts Oracle's publicly available HCM...

10
Experimental
41 Rachana-Baldania/multilingual_invoice_extractor_google_gemini

multilingual_invoice_extractor_google_gemini-master

10
Experimental
42 AjayMaan13/smart-script-analyzer

A Streamlit-based AI tool that uses GPT-4 Vision to extract items, totals,...

10
Experimental

Comparisons in this category