chonkie and chunky
These are complementary tools: Chonkie handles the core chunking and ingestion for RAG pipelines, while Chunky provides validation, visualization, and editing capabilities for inspecting and refining the chunks that Chonkie produces.
About chonkie
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.
About chunky
GiovanniPasq/chunky
Validate, visualize, edit, and export chunks for RAG pipelines.
This tool helps AI engineers and data scientists build more reliable Retrieval-Augmented Generation (RAG) applications by ensuring the quality of source documents. You input PDFs and get out validated Markdown and perfectly structured data chunks, ready for your vector database. It's designed for anyone setting up RAG pipelines who needs to visually inspect and refine their document processing.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work