enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

/ 100

Emerging

ExtractThinker helps you automatically pull specific information from various documents like PDFs, images, and spreadsheets. You define what data you need (e.g., invoice numbers, dates) and it uses AI to find and present it in a structured format. This tool is for developers and data engineers who build applications requiring automated document understanding and data extraction from diverse sources.

1,492 stars. No commits in the last 6 months.

Use this if you are a developer building applications that need to precisely extract structured data or classify document types from a wide range of file formats using large language models.

Not ideal if you are an end-user looking for a no-code solution or a simple drag-and-drop tool to extract data without programming.

document-processing data-extraction AI-integration workflow-automation LLM-development

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

1,492

Forks

145

Language

Python

License

Apache-2.0

Higher-rated alternatives

NanoNets/docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...

th1nhhdk/local_ai_ocr

An local, offline (after initial setup), portable OCR software that can process images and PDF...

Dicklesworthstone/llm_aided_ocr

Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...

emcf/thepipe

Get clean data from tricky documents, powered by vision-language models ⚡

langstruct-ai/langstruct

Extract structured data from any content using LLMs.

Explore LLM Tools

All categories Trending LLM Tool directory Insights