ku-nlp/jumanpp

Juman++ (a Morphological Analyzer Toolkit)

/ 100

Emerging

This tool helps you break down Japanese text into its individual words and their grammatical properties, like parts of speech and base forms. You feed it raw Japanese sentences, and it outputs a detailed linguistic analysis for each word. It's designed for linguists, researchers, or anyone working with natural language processing of Japanese text.

409 stars. No commits in the last 6 months.

Use this if you need to perform detailed morphological analysis on Japanese text, especially when you require accurate segmentation and semantic plausibility considerations for word sequences.

Not ideal if you're working with languages other than Japanese, or if you only need very basic text splitting without deep linguistic insight.

Japanese-language linguistics natural-language-processing text-analysis computational-linguistics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

409

Forks

Language

C++

License

Apache-2.0

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights