mirth/chonky
Fully neural approach for text chunking
This tool intelligently breaks down large blocks of text into smaller, meaningful segments. You provide raw text, potentially with markdown or HTML formatting, and it outputs a series of semantically distinct paragraphs or chunks. This is ideal for anyone working with extensive text documents who needs to process them section by section for analysis or retrieval.
407 stars. Available on PyPI.
Use this if you need to reliably split lengthy documents into coherent, context-rich chunks for tasks like building a question-answering system or performing detailed content analysis.
Not ideal if you require simple, fixed-length text splitting, or if your primary focus is on parsing very short, structured data snippets rather than natural language documents.
Stars
407
Forks
16
Language
Python
License
MIT
Category
Last pushed
Oct 23, 2025
Commits (30d)
0
Dependencies
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/mirth/chonky"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
sentencizer/sentencizer
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and...
jackfsuia/bert-chunker
bert-chunker: efficient and trained chunking for unstructured documents. 训练Bert做文档分段.
prajwal10001/semantic-chunker-langchain
Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support
bgokden/fast-text-splitter
fast text splitter with onnx