Document Chunking RAG Tools

Tools for splitting, segmenting, and optimizing documents into chunks for RAG pipelines. Includes chunking strategies (fixed, semantic, adaptive), chunk visualization/validation, and parameter optimization. Does NOT include document parsing, extraction, embedding, or retrieval components.

There are 39 document chunking tools tracked. 1 score above 70 (verified tier). The highest-rated is chonkie-inc/chonkie at 80/100 with 3,829 stars. 1 of the top 10 are actively maintained.

Get all 39 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=document-chunking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast,...

80
Verified
2 speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not...

48
Emerging
3 jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple...

47
Emerging
4 andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

44
Emerging
5 chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and...

43
Emerging
6 thom-heinrich/chonkify

Extractive document compression for RAG and agent pipelines. +69% vs...

39
Emerging
7 messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for...

39
Emerging
8 chonkie-inc/mtcb

🤔 wondering if your chunks are good? 🦉 Judie is here to Judge and Evaluate...

37
Emerging
9 ayush585/SmartChunk

SmartChunk is a lightweight, structure-aware semantic chunking toolkit...

37
Emerging
10 ALucek/chunking-strategies

An Overview of the Latest Document Chunking Research

36
Emerging
11 GiovanniPasq/chunky

Validate, visualize, edit, and export chunks for RAG pipelines.

32
Emerging
12 AceAtDev/RAG-chunker

The easiest and most effective way tool to retrain a RAG LLM/GEN AI/Agent on...

31
Emerging
13 bazilicum/axonode-chunker

Advanced semantic text chunking with custom structural markers, whole-text...

30
Emerging
14 smart-models/Sentences-Chunker

Cutting-edge tool designed to intelligently segment text documents into...

29
Experimental
15 mirpo/chopdoc

A tool to split documents into chunks for RAG and LLM applications

28
Experimental
16 wevote-project/crystal-text-splitter

Intelligent text chunking for RAG (Retrieval-Augmented Generation) and LLM...

27
Experimental
17 yuma-shintani/chunksize-checker

Calculate the number of total tokens, optimal chunk size and chunk overlap...

27
Experimental
18 arclabs561/slabs

Text chunking for RAG: fixed, sentence, recursive, and semantic splitting

26
Experimental
19 ekimetrics/adaptive-chunking

Adaptive Chunking: automatically select the best chunking method per...

25
Experimental
20 philip-zhan/semchunk.rb

Ruby port of https://github.com/isaacus-dev/semchunk

25
Experimental
21 stranger00135/ragflow-optimizer

Automatically discover the best RAGFlow chunking parameters for each...

24
Experimental
22 asukhodko/dify-markdown-chunker

Advanced Markdown text chunker tool plugin for Dify RAG / knowledge bases

24
Experimental
23 MukundaKatta/ChunkWise

ChunkWise — Intelligent Document Chunking. Smart document chunking for RAG pipelines

22
Experimental
24 zenwor/icm_rag

🧩 Intelligent Chunking Methods for Code Documentation RAG

22
Experimental
25 pranavms13/deepcontext

A semantic chunking service for documents, GitHub repos, webpages, and...

21
Experimental
26 cwccie/ragchunk

Chunking library for technical documentation — domain-aware splitting for...

21
Experimental
27 Leo310/rag-chunking-evaluation

Assess the effectiveness of chunking strategies in RAG systems via a custom...

18
Experimental
28 fujiba/pdf-chunker

LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and...

15
Experimental
29 sanbaiw/semtxtsplitter

A smol Go package for splitting text into chunks while preserving semantic meaning.

14
Experimental
30 Devparihar5/chunking-strategies-comparison

A deep dive into text chunking for Retrieval-Augmented Generation systems

13
Experimental
31 DTufail/rag-chunk-eval

Benchmarking harness for RAG chunking strategies — compares Fixed,...

13
Experimental
32 tainmou/SmartChunk

🧩 Enhance RAG processes with SmartChunk, a Python package that creates...

13
Experimental
33 hemantjuyal/Latent-Chunk-Lab

A hands-on playground to explore different chunking techniques for...

13
Experimental
34 OneOffTech/the-chunk-list

A comprehensive open-source database of document parsers, their pricing, and...

13
Experimental
35 Arnav-Ajay/rag-chunking-strategies

A controlled study showing how different chunking strategies change which...

13
Experimental
36 CodyPedersen/tikchunk

Highly performant python library for recursive semantic text chunking while...

13
Experimental
37 AleGallagher/ChunkingTechniques

🚀 Comprehensive evaluation of chunking techniques for RAG pipelines. Compare...

13
Experimental
38 mburaksayici/SemanticChunkingAndPropositionModel

SemanticChunkingPropositionModel

11
Experimental
39 davidmoserai/AzureDocumentIntelligenceChunker

A lightweight Python library for metadata-rich document chunking in...

10
Experimental