KoichiYasuoka/SuPar-UniDic
Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese with BERT models
This tool helps researchers and linguists analyze Japanese text by breaking down sentences into their core components. It takes raw Japanese sentences as input and outputs detailed linguistic annotations, including word segmentation, parts of speech, base forms of words, and the grammatical relationships between them. Anyone working with Japanese text analysis, such as computational linguists or social science researchers, would find this useful.
Use this if you need to deeply understand the grammatical structure and meaning of modern and contemporary Japanese text.
Not ideal if you only need basic keyword extraction or are working with languages other than Japanese.
Stars
20
Forks
4
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Feb 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/KoichiYasuoka/SuPar-UniDic"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
natasha/razdel
Rule-based token, sentence segmentation for Russian language