polm/fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

/ 100

Emerging

fugashi helps you analyze Japanese text by breaking it down into individual words and their grammatical features. You provide raw Japanese sentences, and it outputs a list of words, along with details like their root form and part of speech. This is ideal for anyone working with Japanese language data, such as linguists, data analysts, or researchers, who need to understand the structure and meaning within Japanese text.

515 stars.

Use this if you need to quickly and accurately break down Japanese sentences into words and identify their grammatical properties for analysis or natural language processing tasks.

Not ideal if you are working with languages other than Japanese, such as Korean, or if you prefer a solution that requires absolutely no installation of MeCab itself.

Japanese-language-processing text-analysis linguistics data-analysis NLP

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

515

Forks

Language

C++

License

MIT

Higher-rated alternatives

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights