chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

80
/ 100
Verified

This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.

3,829 stars. Used by 15 other packages. Actively maintained with 82 commits in the last 30 days. Available on PyPI.

Use this if you are a developer building RAG applications and need a fast, efficient, and flexible way to prepare diverse text data for embedding and storage in vector databases.

Not ideal if you are not a developer building RAG applications, as this is a library for technical implementation rather than a direct end-user application.

RAG development LLM application development text preprocessing vector database integration AI application engineering
Maintenance 22 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 18 / 25

How are scores calculated?

Stars

3,829

Forks

256

Language

Python

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

82

Dependencies

4

Reverse dependents

15

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/chonkie-inc/chonkie"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.