chonkie and chunking-strategies

A production-ready chunking library and a research overview repository are **complements**: the latter informs the design decisions and benchmarking choices for the former, while practitioners using the former might consult the latter to understand the algorithmic tradeoffs underlying their chunking strategy.

chonkie
80
Verified
chunking-strategies
36
Emerging
Maintenance 22/25
Adoption 15/25
Maturity 25/25
Community 18/25
Maintenance 0/25
Adoption 9/25
Maturity 8/25
Community 19/25
Stars: 3,829
Forks: 256
Downloads:
Commits (30d): 82
Language: Python
License: MIT
Stars: 85
Forks: 18
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License:
No risk flags
No License Stale 6m No Package No Dependents

About chonkie

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

This is a lightweight tool for developers building Retrieval-Augmented Generation (RAG) applications. It takes various forms of text data, processes it by intelligently splitting it into smaller, meaningful parts (chunks), and then refines and embeds these chunks. The output is optimized text chunks ready to be stored in a vector database for efficient retrieval by large language models.

RAG development LLM application development text preprocessing vector database integration AI application engineering

About chunking-strategies

ALucek/chunking-strategies

An Overview of the Latest Document Chunking Research

This project helps you prepare large text documents for use with AI systems like chatbots or question-answering tools. It takes your raw, unstructured text and breaks it down into smaller, optimized pieces that improve how accurately the AI can understand and respond to your queries. Anyone building or managing RAG (Retrieval Augmented Generation) applications, from content managers to data scientists, would find this useful.

AI-application-development natural-language-processing text-retrieval knowledge-management generative-AI

Scores updated daily from GitHub, PyPI, and npm data. How scores work