tetutaro/mecab_dictionaries

create various dictionaries for MeCab and MeCab CLI using fugashi

/ 100

Experimental

When performing Japanese natural language processing, you need specialized dictionaries to accurately split sentences into individual words. This project provides scripts to create ready-to-use Python packages of various MeCab dictionaries, including UniDic, IPA, and JUMAN dictionaries, optionally enhanced with NEologd. It's for data scientists or researchers who need precise Japanese text analysis.

No commits in the last 6 months.

Use this if you are a developer working on Japanese text analysis and need to quickly set up MeCab dictionaries within your Python environment, especially when using 'fugashi' or 'mecab-python3'.

Not ideal if you're looking for a pre-packaged application or a non-technical solution to analyze Japanese text without needing to build or manage dictionary resources yourself.

Japanese NLP Morpheme Analysis Text Processing Linguistics Information Extraction

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights