chonkie and chunklet-py

These are competitors—both are chunking libraries designed to split documents into semantically meaningful pieces for RAG pipelines, with Chonkie offering more mature, production-tested functionality while Chunklet-py provides a simpler, multi-format alternative.

chonkie
80
Verified
chunklet-py
48
Emerging
Maintenance 22/25
Adoption 15/25
Maturity 25/25
Community 18/25
Maintenance 10/25
Adoption 9/25
Maturity 24/25
Community 5/25
Stars: 3,829
Forks: 256
Downloads:
Commits (30d): 82
Language: Python
License: MIT
Stars: 64
Forks: 2
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
No risk flags

About chonkie

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.

RAG development LLM application development text preprocessing vector database integration AI application engineering

About chunklet-py

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

This tool helps AI engineers and researchers prepare various types of text, documents, and code for use in large language models (LLMs) and retrieval-augmented generation (RAG) systems. It takes in raw text, PDFs, Word documents, code files, and more, then intelligently breaks them down into smaller, meaningful, and context-rich pieces. The output is 'chunked' data that preserves meaning and structure, along with valuable metadata for better AI performance.

AI-engineering LLM-data-prep RAG-pipeline-optimization document-processing code-analysis

Scores updated daily from GitHub, PyPI, and npm data. How scores work