chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

/ 100

Verified

This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.

3,829 stars. Used by 15 other packages. Actively maintained with 82 commits in the last 30 days. Available on PyPI.

Use this if you are a developer building RAG applications and need a fast, efficient, and flexible way to prepare diverse text data for embedding and storage in vector databases.

Not ideal if you are not a developer building RAG applications, as this is a library for technical implementation rather than a direct end-user application.

RAG development LLM application development text preprocessing vector database integration AI application engineering

Maintenance 22 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 18 / 25

How are scores calculated?

Stars

3,829

Forks

256

Language

Python

License

MIT

Compare

chonkie and chunklet-py chonkie and jchunk chonkie and chonkiejs chonkie and chonkify chonkie and rag-chunk chonkie and SmartChunk chonkie and chunking-strategies chonkie and chunky chonkie and RAG-chunker chonkie and axonode-chunker

Related tools

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...

jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text...

andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

thom-heinrich/chonkify

Extractive document compression for RAG and agent pipelines. +69% vs LLMLingua, +175% vs...

Explore RAG Tools

All categories Trending RAG directory Insights