KoichiYasuoka/SuPar-UniDic

Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese with BERT models

/ 100

Emerging

This tool helps researchers and linguists analyze Japanese text by breaking down sentences into their core components. It takes raw Japanese sentences as input and outputs detailed linguistic annotations, including word segmentation, parts of speech, base forms of words, and the grammatical relationships between them. Anyone working with Japanese text analysis, such as computational linguists or social science researchers, would find this useful.

Use this if you need to deeply understand the grammatical structure and meaning of modern and contemporary Japanese text.

Not ideal if you only need basic keyword extraction or are working with languages other than Japanese.

Japanese-linguistics text-analysis NLP computational-linguistics corpus-analysis

No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights