jamesmcroft/azure-document-intelligence-markdown-to-openai-data-extraction-sample
This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.
Quickly extract structured data from various documents like invoices, PDFs, Word files, or even images. It takes your unstructured document content and converts it into a clean, organized JSON format, ready for your databases or other systems. This is ideal for business analysts, operations managers, or anyone needing to automate data entry from diverse document types.
No commits in the last 6 months.
Use this if you need to reliably pull specific, structured information from a wide variety of document formats without having to manually train a custom model for each document type.
Not ideal if you only work with a very limited set of highly standardized documents and prefer to build a custom extraction model from scratch for maximum control.
Stars
31
Forks
12
Language
Jupyter Notebook
License
—
Category
Last pushed
May 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jamesmcroft/azure-document-intelligence-markdown-to-openai-data-extraction-sample"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...
th1nhhdk/local_ai_ocr
An local, offline (after initial setup), portable OCR software that can process images and PDF...
Dicklesworthstone/llm_aided_ocr
Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...
emcf/thepipe
Get clean data from tricky documents, powered by vision-language models ⚡
langstruct-ai/langstruct
Extract structured data from any content using LLMs.