aws-samples/layout-aware-document-processing-and-retrieval-augmented-generation

Advanced document extraction and chunking techniques for retrieval augmented generation that is aware of the layout of documents. Increases knowledge retrieval accuracy and provides control for retrieved knowledge context management

/ 100

Emerging

This project helps you accurately extract information from complex documents like reports or manuals and prepare it for AI-powered question-answering. It takes multi-page documents (PDFs, images) and outputs structured, context-rich text chunks, including properly formatted tables and lists. This is for professionals like researchers, legal analysts, or operations managers who need to find precise answers within large document repositories.

115 stars.

Use this if you need to extract and organize detailed information from documents, including tables and lists, to power highly accurate AI systems that answer questions based on your specific content.

Not ideal if you only need simple text extraction without regard for document layout, tables, or complex hierarchical structures, or if you don't plan to use the extracted data for advanced AI retrieval systems.

document-intelligence information-extraction knowledge-management enterprise-search report-analysis

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

115

Forks

Language

Jupyter Notebook

License

MIT-0

Higher-rated alternatives

yichuan-w/LEANN

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast,...

byerlikaya/SmartRAG

Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language....

sourangshupal/simple-rag-langchain

Exploring the Basics of Langchain

sion42x/llama-index-milvus-example

Open AI APIs with Llama Index and Milvus Vector DB for Retrieval Augmented Generation (RAG) testing

Maverick0351a/neuralcache

NeuralCache is a drop-in reranker for Retrieval-Augmented Generation (RAG) that learns which...

Explore Vector Databases

All categories Trending Vector Database directory Insights