daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

/ 100

Established

This tool helps you quickly break down Japanese text into its individual words and their grammatical roles, a process known as tokenization or morphological analysis. You input raw Japanese sentences, and it outputs a list of words with details like their part of speech, readings, and base forms. It's ideal for anyone working with Japanese language data, such as computational linguists, researchers, or data analysts, who needs to process text efficiently for further analysis.

404 stars and 3,503 monthly downloads.

Use this if you need a very fast and accurate way to tokenize Japanese text, especially when working with large volumes of data or when speed is critical.

Not ideal if you primarily work with languages other than Japanese, as it's specifically designed for Japanese morphological analysis.

Japanese-language-processing text-analysis natural-language-understanding computational-linguistics data-mining

No Package No Dependents

Maintenance 10 / 25

Adoption 18 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

404

Forks

Language

Rust

License

Apache-2.0

Related tools

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Systemcluster/kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...

daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

soaxelbrooke/python-bpe

Byte Pair Encoding for Python!

Explore NLP Tools

All categories Trending NLP directory Insights