andreshere00/Splitter_MR

Chunk your data into markdown text blocks for your LLM applications

/ 100

Emerging

This tool helps developers working with large language models (LLMs) to prepare various types of data for their applications. It takes in diverse file formats like text, PDFs, Office documents, JSON, or images, processes them, and outputs organized chunks of text in Markdown format. This is ideal for developers building LLM-powered applications who need to efficiently manage and segment source data.

Available on PyPI.

Use this if you need to reliably break down unstructured data from many file types into manageable, semantically coherent text blocks for your LLM applications.

Not ideal if you only work with small, pre-formatted text segments or do not develop applications using large language models.

LLM development data preprocessing document parsing text chunking AI application development

Maintenance 6 / 25

Adoption 7 / 25

Maturity 24 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

chonkie-inc/chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust...

speedyk-005/chunklet-py

One library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs,...

jchunk-io/jchunk

JChunk is a lightweight and flexible library designed to provide multiple strategies for text...

chonkie-inc/chonkiejs

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

thom-heinrich/chonkify

Extractive document compression for RAG and agent pipelines. +69% vs LLMLingua, +175% vs...

Explore RAG Tools

All categories Trending RAG directory Insights