polm/fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
fugashi helps you analyze Japanese text by breaking it down into individual words and their grammatical features. You provide raw Japanese sentences, and it outputs a list of words, along with details like their root form and part of speech. This is ideal for anyone working with Japanese language data, such as linguists, data analysts, or researchers, who need to understand the structure and meaning within Japanese text.
515 stars.
Use this if you need to quickly and accurately break down Japanese sentences into words and identify their grammatical properties for analysis or natural language processing tasks.
Not ideal if you are working with languages other than Japanese, such as Korean, or if you prefer a solution that requires absolutely no installation of MeCab itself.
Stars
515
Forks
39
Language
C++
License
MIT
Category
Last pushed
Oct 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/polm/fugashi"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
natasha/razdel
Rule-based token, sentence segmentation for Russian language