mkartawijaya/dango

An easy to use tokenizer for Japanese text, aimed at language learners and non-linguists

/ 100

Emerging

This tool helps Japanese language learners and non-linguists break down Japanese sentences into individual words. You input raw Japanese text, and it outputs the text segmented into words, along with details like dictionary forms, parts of speech (verb, noun, etc.), and hiragana readings for Kanji. It's designed for anyone studying Japanese or needing to understand the structure of Japanese text without deep linguistic knowledge.

No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly extract vocabulary, understand sentence structure, or prepare learning materials from Japanese texts.

Not ideal if you require highly granular linguistic analysis, as it prioritizes user-friendly word segmentation over minute morphological breakdown.

Japanese-language-learning vocabulary-extraction text-analysis reading-assistance

Stale 6m

Maintenance 0 / 25

Adoption 7 / 25

Maturity 25 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

BSD-3-Clause

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights