daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
This tool helps you quickly break down Japanese text into its individual words and their grammatical roles, a process known as tokenization or morphological analysis. You input raw Japanese sentences, and it outputs a list of words with details like their part of speech, readings, and base forms. It's ideal for anyone working with Japanese language data, such as computational linguists, researchers, or data analysts, who needs to process text efficiently for further analysis.
404 stars and 3,503 monthly downloads.
Use this if you need a very fast and accurate way to tokenize Japanese text, especially when working with large volumes of data or when speed is critical.
Not ideal if you primarily work with languages other than Japanese, as it's specifically designed for Japanese morphological analysis.
Stars
404
Forks
23
Language
Rust
License
Apache-2.0
Category
Last pushed
Feb 07, 2026
Monthly downloads
3,503
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/daac-tools/vibrato"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Systemcluster/kitoken
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...
daac-tools/vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
soaxelbrooke/python-bpe
Byte Pair Encoding for Python!