chonkie and chunklet-py
These are competitors—both are chunking libraries designed to split documents into semantically meaningful pieces for RAG pipelines, with Chonkie offering more mature, production-tested functionality while Chunklet-py provides a simpler, multi-format alternative.
About chonkie
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.
About chunklet-py
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
This tool helps AI engineers and researchers prepare various types of text, documents, and code for use in large language models (LLMs) and retrieval-augmented generation (RAG) systems. It takes in raw text, PDFs, Word documents, code files, and more, then intelligently breaks them down into smaller, meaningful, and context-rich pieces. The output is 'chunked' data that preserves meaning and structure, along with valuable metadata for better AI performance.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work