himkt/konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

/ 100

Established

When you need to break down Japanese text into individual words or sentences for analysis, this tool helps you do it consistently. You provide raw Japanese text, and it outputs the text segmented into meaningful units, like words or phrases. This is designed for data scientists, linguists, or anyone working with Japanese text who needs to prepare it for further computational processing.

261 stars.

Use this if you need to reliably split Japanese text into words or sentences and want the flexibility to easily switch between different text segmentation methods.

Not ideal if you are looking for advanced natural language understanding features beyond basic text segmentation, such as sentiment analysis or named entity recognition.

Japanese-text-analysis NLP-preprocessing text-segmentation linguistics data-preparation

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

261

Forks

Language

Python

License

MIT

Related tools

EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Explore NLP Tools

All categories Trending NLP directory Insights