mohanbing/st_doc_ext

This repository contains the code for the information extraction app that uses langchain to extract a structured output from unstructured data for a particular schema.

/ 100

Emerging

This tool helps you quickly pull specific details from documents like PDFs or images. You provide your unstructured text or image, define what information you're looking for (like a template), and it extracts that data into a clean, organized format. It's ideal for anyone who regularly sifts through documents to find key pieces of information, such as researchers, legal professionals, or administrative staff.

No commits in the last 6 months.

Use this if you need to systematically extract predefined data points from various unstructured documents, turning them into a structured output.

Not ideal if you need to analyze the overall sentiment or general content of a document rather than specific, templated information.

data-extraction document-processing information-retrieval workflow-automation content-structuring

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

NanoNets/docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...

th1nhhdk/local_ai_ocr

An local, offline (after initial setup), portable OCR software that can process images and PDF...

Dicklesworthstone/llm_aided_ocr

Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...

emcf/thepipe

Get clean data from tricky documents, powered by vision-language models ⚡

langstruct-ai/langstruct

Extract structured data from any content using LLMs.

Explore LLM Tools

All categories Trending LLM Tool directory Insights