rishikksh20/voxtral-codec-pytoch
Voxtral Codec : Combining Semantic VQ and Acoustic FSQ for Ultra-Low Bitrate Speech Generation (Voxtral TTS Backbone)
This project helps create highly compressed digital representations of human speech, optimized for text-to-speech (TTS) systems. It takes 24 kHz mono speech recordings as input and outputs a stream of ultra-low bitrate discrete codes that preserve both the semantic meaning and acoustic characteristics of the original speech. Speech synthesis researchers and developers would use this to build more efficient and higher-quality TTS models.
Use this if you are a speech synthesis researcher or developer building text-to-speech systems and need an efficient way to convert raw speech audio into discrete, low-bitrate codes for model training.
Not ideal if you need a pre-trained, ready-to-use text-to-speech model or are looking for a general-purpose audio compression tool for music or other sound types.
Stars
9
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 27, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/rishikksh20/voxtral-codec-pytoch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kan-bayashi/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
fatchord/WaveRNN
WaveRNN Vocoder + TTS
shangeth/wavencoder
WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation,...
rishikksh20/iSTFTNet-pytorch
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier...
seungwonpark/melgan
MelGAN vocoder (compatible with NVIDIA/tacotron2)