jorge-menjivar/tekken-rs
Rust implementation of the Mistral Tekken tokenizer
This is a Rust library for developers building applications that process text and audio using Mistral AI's large language models. It takes raw text or WAV audio files and converts them into numerical tokens, or reconstructs text from tokens. Developers would use this to prepare data for or interpret outputs from Mistral AI models.
Use this if you are a Rust developer working with Mistral AI models and need a fast, efficient, and fully compatible tokenizer for both text and audio data.
Not ideal if you are not a Rust developer or if your project does not involve Mistral AI's tokenization scheme.
Stars
8
Forks
1
Language
Rust
License
Apache-2.0
Category
Last pushed
Mar 16, 2026
Monthly downloads
507
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/jorge-menjivar/tekken-rs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Systemcluster/kitoken
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...
daac-tools/vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer