davidmoserai/AzureDocumentIntelligenceChunker
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
No commits in the last 6 months.
Stars
2
Forks
—
Language
Python
License
—
Category
Last pushed
Jan 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/davidmoserai/AzureDocumentIntelligenceChunker"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text...
andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library