frothywater/kanade-tokenizer
Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative modeling.
This tool helps researchers and engineers working with spoken language models convert raw audio into a compact, numerical representation. You provide audio files as input, and it outputs disentangled speech tokens that can be used for tasks like voice synthesis or speech recognition. It's designed for those developing or training advanced speech-related AI.
Use this if you need to process spoken audio into discrete, manageable tokens for developing generative or discriminative speech models.
Not ideal if you're looking for a direct, end-user application for transcribing audio to text or generating speech without developing models yourself.
Stars
85
Forks
11
Language
Python
License
—
Category
Last pushed
Feb 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/frothywater/kanade-tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including...
sugarme/tokenizer
NLP tokenizers written in Go language
elixir-nx/tokenizers
Elixir bindings for 🤗 Tokenizers
openscilab/tocount
ToCount: Lightweight Token Estimator
reinfer/blingfire-rs
Rust wrapper for the BlingFire tokenization library