smart-models/Normalized-Semantic-Chunker
Cutting-edge tool that unlocks the full potential of semantic chunking
This tool helps knowledge managers and AI engineers prepare long documents for large language models (LLMs) and retrieval systems. You input raw text, Markdown, or JSON files, and it produces semantically coherent document segments. These segments are optimized for consistent token counts, preventing issues like context window overflow in LLMs.
No commits in the last 6 months.
Use this if you need to precisely control the token size of text chunks while maintaining their semantic meaning, especially for RAG pipelines or other token-sensitive NLP applications.
Not ideal if you only need basic text splitting without concern for semantic coherence or precise control over chunk token counts.
Stars
21
Forks
5
Language
Python
License
GPL-3.0
Category
Last pushed
Sep 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/smart-models/Normalized-Semantic-Chunker"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
drittich/SemanticSlicer
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
ndgigliotti/afterthoughts
Sentence-aware embeddings using late chunking with transformers.
ReemHal/Semantic-Text-Segmentation-with-Embeddings
Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document...
agamm/semantic-split
A Python library to chunk/group your texts based on semantic similarity.