ReemHal/Semantic-Text-Segmentation-with-Embeddings
Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document into any number of k segments.
This helps break down long text documents into a specific number of shorter, topically consistent sections. You provide a document and the number of parts you want, and it outputs the original text reorganized into these meaningful segments. This is useful for researchers, content strategists, or anyone needing to analyze or summarize lengthy texts by their core themes.
No commits in the last 6 months.
Use this if you need to automatically divide a long document into a predetermined number of thematically similar chunks for easier understanding or analysis.
Not ideal if you need to segment text based on specific structural markers like headings, paragraphs, or predefined keywords.
Stars
33
Forks
14
Language
Jupyter Notebook
License
—
Category
Last pushed
Feb 17, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ReemHal/Semantic-Text-Segmentation-with-Embeddings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
drittich/SemanticSlicer
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
smart-models/Normalized-Semantic-Chunker
Cutting-edge tool that unlocks the full potential of semantic chunking
ndgigliotti/afterthoughts
Sentence-aware embeddings using late chunking with transformers.
agamm/semantic-split
A Python library to chunk/group your texts based on semantic similarity.