zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
This tool helps you break down longer pieces of text, like articles or reports, into individual sentences. You feed it a paragraph or a document, and it outputs a list of clearly separated sentences, even handling tricky punctuation or language-specific sentence endings. Anyone working with text data who needs to process or analyze content sentence-by-sentence, such as researchers, linguists, or data analysts, would find this useful.
Used by 1 other package. Available on PyPI.
Use this if you need to accurately split text into sentences across multiple languages or require fine-grained control over how sentences are detected, including custom rules for specific domain text.
Not ideal if you only need very basic English sentence splitting and don't require advanced customization or support for diverse languages and text formats.
Stars
31
Forks
9
Language
Python
License
MIT
Category
Last pushed
Feb 22, 2026
Commits (30d)
0
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/zaemyung/sentsplit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
natasha/razdel
Rule-based token, sentence segmentation for Russian language
polm/cutlet
Japanese to romaji converter in Python