mkartawijaya/dango
An easy to use tokenizer for Japanese text, aimed at language learners and non-linguists
This tool helps Japanese language learners and non-linguists break down Japanese sentences into individual words. You input raw Japanese text, and it outputs the text segmented into words, along with details like dictionary forms, parts of speech (verb, noun, etc.), and hiragana readings for Kanji. It's designed for anyone studying Japanese or needing to understand the structure of Japanese text without deep linguistic knowledge.
No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly extract vocabulary, understand sentence structure, or prepare learning materials from Japanese texts.
Not ideal if you require highly granular linguistic analysis, as it prioritizes user-friendly word segmentation over minute morphological breakdown.
Stars
25
Forks
3
Language
Python
License
BSD-3-Clause
Category
Last pushed
Nov 21, 2021
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/mkartawijaya/dango"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
natasha/razdel
Rule-based token, sentence segmentation for Russian language