andreshere00/Splitter_MR
Chunk your data into markdown text blocks for your LLM applications
This tool helps developers working with large language models (LLMs) to prepare various types of data for their applications. It takes in diverse file formats like text, PDFs, Office documents, JSON, or images, processes them, and outputs organized chunks of text in Markdown format. This is ideal for developers building LLM-powered applications who need to efficiently manage and segment source data.
Available on PyPI.
Use this if you need to reliably break down unstructured data from many file types into manageable, semantically coherent text blocks for your LLM applications.
Not ideal if you only work with small, pre-formatted text segments or do not develop applications using large language models.
Stars
25
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 08, 2026
Commits (30d)
0
Dependencies
18
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/andreshere00/Splitter_MR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chonkie-inc/chonkie
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...
speedyk-005/chunklet-py
One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...
jchunk-io/jchunk
JChunk is a lightweight and flexible library designed to provide multiple strategies for text...
chonkie-inc/chonkiejs
🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library
thom-heinrich/chonkify
Extractive document compression for RAG and agent pipelines. +69% vs LLMLingua, +175% vs...