smart-models/Normalized-Semantic-Chunker

Cutting-edge tool that unlocks the full potential of semantic chunking

/ 100

Emerging

This tool helps knowledge managers and AI engineers prepare long documents for large language models (LLMs) and retrieval systems. You input raw text, Markdown, or JSON files, and it produces semantically coherent document segments. These segments are optimized for consistent token counts, preventing issues like context window overflow in LLMs.

No commits in the last 6 months.

Use this if you need to precisely control the token size of text chunks while maintaining their semantic meaning, especially for RAG pipelines or other token-sensitive NLP applications.

Not ideal if you only need basic text splitting without concern for semantic coherence or precise control over chunk token counts.

knowledge-management text-processing retrieval-augmented-generation natural-language-processing large-language-models

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Compare

Normalized-Semantic-Chunker and semantic-chunking

Higher-rated alternatives

jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

drittich/SemanticSlicer

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

ndgigliotti/afterthoughts

Sentence-aware embeddings using late chunking with transformers.

ReemHal/Semantic-Text-Segmentation-with-Embeddings

Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document...

agamm/semantic-split

A Python library to chunk/group your texts based on semantic similarity.

Explore Embedding Tools

All categories Trending Embeddings directory Insights