drittich/SemanticSlicer

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

/ 100

Emerging

When working with large text documents, like articles, reports, or web pages, you often need to break them into smaller, meaningful pieces to prepare them for AI models. This tool takes your raw text (from files or piped input) and outputs a series of carefully segmented text chunks, preserving important structural elements like sentences and headings. It's designed for anyone preparing content for natural language processing tasks, especially those involving AI embeddings.

Use this if you need to reliably segment long text documents (e.g., Markdown, HTML, plain text) into smaller, semantically coherent chunks for use with large language models or AI embedding services.

Not ideal if you only need basic text splitting by character count, or if your primary goal is simple string manipulation rather than preparing text for AI processing.

content-preparation document-processing text-analysis knowledge-management

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

License

MIT

Higher-rated alternatives

jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

smart-models/Normalized-Semantic-Chunker

Cutting-edge tool that unlocks the full potential of semantic chunking

ndgigliotti/afterthoughts

Sentence-aware embeddings using late chunking with transformers.

ReemHal/Semantic-Text-Segmentation-with-Embeddings

Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document...

agamm/semantic-split

A Python library to chunk/group your texts based on semantic similarity.

Explore Embedding Tools

All categories Trending Embeddings directory Insights