tokuhirom/jawiki-kana-kanji-dict

Generate SKK/MeCab dictionary from Wikipedia(Japanese edition)

/ 100

Emerging

This project helps developers and linguists improve Japanese text input and analysis by providing up-to-date dictionary files. It takes the latest Japanese Wikipedia data and converts it into SKK and MeCab dictionary formats. The output is ready-to-use dictionary files that can enhance the accuracy of kana-to-kanji conversion and natural language processing tasks.

Use this if you need frequently updated, comprehensive Japanese dictionary files for SKK input methods or MeCab natural language processing.

Not ideal if you require a dictionary for a highly specialized domain not covered by general Wikipedia content, or if you prefer to build dictionaries from scratch with proprietary data sources.

Japanese-language-processing Japanese-input-method NLP-dictionaries text-analysis linguistics

No License No Package No Dependents

Maintenance 13 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights