natasha/razdel
Rule-based token, sentence segmentation for Russian language
This tool helps anyone working with Russian text to break down sentences into individual words or punctuation marks, and longer texts into separate sentences. You provide raw Russian text, and it returns a list of its constituent parts. It's ideal for linguists, researchers, or data analysts processing large volumes of Russian language content.
279 stars. Used by 4 other packages. No commits in the last 6 months. Available on PyPI.
Use this if you need to accurately split Russian news articles, fiction, or similar formal texts into words and sentences for further analysis.
Not ideal if your Russian text comes from social media, scientific papers, or legal documents, as its rules are optimized for news and fiction.
Stars
279
Forks
34
Language
Python
License
MIT
Category
Last pushed
Jul 24, 2023
Commits (30d)
0
Reverse dependents
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/natasha/razdel"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
polm/cutlet
Japanese to romaji converter in Python