drittich/SemanticSlicer
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
When working with large text documents, like articles, reports, or web pages, you often need to break them into smaller, meaningful pieces to prepare them for AI models. This tool takes your raw text (from files or piped input) and outputs a series of carefully segmented text chunks, preserving important structural elements like sentences and headings. It's designed for anyone preparing content for natural language processing tasks, especially those involving AI embeddings.
Use this if you need to reliably segment long text documents (e.g., Markdown, HTML, plain text) into smaller, semantically coherent chunks for use with large language models or AI embedding services.
Not ideal if you only need basic text splitting by character count, or if your primary goal is simple string manipulation rather than preparing text for AI processing.
Stars
36
Forks
5
Language
C#
License
MIT
Category
Last pushed
Feb 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/drittich/SemanticSlicer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
smart-models/Normalized-Semantic-Chunker
Cutting-edge tool that unlocks the full potential of semantic chunking
ndgigliotti/afterthoughts
Sentence-aware embeddings using late chunking with transformers.
ReemHal/Semantic-Text-Segmentation-with-Embeddings
Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document...
agamm/semantic-split
A Python library to chunk/group your texts based on semantic similarity.