wwwcojp/ja_sentence_segmenter
japanese sentence segmentation library for python
When working with Japanese text, this tool helps you automatically break down long passages into individual sentences. You feed it a block of Japanese text, and it returns a list of neatly separated sentences. This is useful for anyone analyzing Japanese content, like researchers, linguists, or data analysts.
No commits in the last 6 months. Available on PyPI.
Use this if you need to reliably identify and extract individual sentences from raw Japanese text for further analysis or processing.
Not ideal if your primary need is for advanced linguistic parsing beyond simple sentence boundaries, such as morphological analysis or dependency parsing.
Stars
74
Forks
2
Language
Python
License
MIT
Category
Last pushed
Apr 03, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/wwwcojp/ja_sentence_segmenter"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
natasha/razdel
Rule-based token, sentence segmentation for Russian language