tokuhirom/jawiki-kana-kanji-dict
Generate SKK/MeCab dictionary from Wikipedia(Japanese edition)
This project helps developers and linguists improve Japanese text input and analysis by providing up-to-date dictionary files. It takes the latest Japanese Wikipedia data and converts it into SKK and MeCab dictionary formats. The output is ready-to-use dictionary files that can enhance the accuracy of kana-to-kanji conversion and natural language processing tasks.
Use this if you need frequently updated, comprehensive Japanese dictionary files for SKK input methods or MeCab natural language processing.
Not ideal if you require a dictionary for a highly specialized domain not covered by general Wikipedia content, or if you prefer to build dictionaries from scratch with proprietary data sources.
Stars
60
Forks
2
Language
Python
License
—
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/tokuhirom/jawiki-kana-kanji-dict"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
natasha/razdel
Rule-based token, sentence segmentation for Russian language