chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.
3,829 stars. Used by 15 other packages. Actively maintained with 82 commits in the last 30 days. Available on PyPI.
Use this if you are a developer building RAG applications and need a fast, efficient, and flexible way to prepare diverse text data for embedding and storage in vector databases.
Not ideal if you are not a developer building RAG applications, as this is a library for technical implementation rather than a direct end-user application.
Stars
3,829
Forks
256
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
82
Dependencies
4
Reverse dependents
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/chonkie-inc/chonkie"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text...
andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
thom-heinrich/chonkify
Extractive document compression for RAG and agent pipelines. +69% vs LLMLingua, +175% vs...