aws-samples/layout-aware-document-processing-and-retrieval-augmented-generation
Advanced document extraction and chunking techniques for retrieval augmented generation that is aware of the layout of documents. Increases knowledge retrieval accuracy and provides control for retrieved knowledge context management
This project helps you accurately extract information from complex documents like reports or manuals and prepare it for AI-powered question-answering. It takes multi-page documents (PDFs, images) and outputs structured, context-rich text chunks, including properly formatted tables and lists. This is for professionals like researchers, legal analysts, or operations managers who need to find precise answers within large document repositories.
115 stars.
Use this if you need to extract and organize detailed information from documents, including tables and lists, to power highly accurate AI systems that answer questions based on your specific content.
Not ideal if you only need simple text extraction without regard for document layout, tables, or complex hierarchical structures, or if you don't plan to use the extracted data for advanced AI retrieval systems.
Stars
115
Forks
14
Language
Jupyter Notebook
License
MIT-0
Category
Last pushed
Dec 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/aws-samples/layout-aware-document-processing-and-retrieval-augmented-generation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
yichuan-w/LEANN
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast,...
byerlikaya/SmartRAG
Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language....
sourangshupal/simple-rag-langchain
Exploring the Basics of Langchain
sion42x/llama-index-milvus-example
Open AI APIs with Llama Index and Milvus Vector DB for Retrieval Augmented Generation (RAG) testing
Maverick0351a/neuralcache
NeuralCache is a drop-in reranker for Retrieval-Augmented Generation (RAG) that learns which...