rishikksh20/voxtral-codec-pytoch

Voxtral Codec : Combining Semantic VQ and Acoustic FSQ for Ultra-Low Bitrate Speech Generation (Voxtral TTS Backbone)

19
/ 100
Experimental

This project helps create highly compressed digital representations of human speech, optimized for text-to-speech (TTS) systems. It takes 24 kHz mono speech recordings as input and outputs a stream of ultra-low bitrate discrete codes that preserve both the semantic meaning and acoustic characteristics of the original speech. Speech synthesis researchers and developers would use this to build more efficient and higher-quality TTS models.

Use this if you are a speech synthesis researcher or developer building text-to-speech systems and need an efficient way to convert raw speech audio into discrete, low-bitrate codes for model training.

Not ideal if you need a pre-trained, ready-to-use text-to-speech model or are looking for a general-purpose audio compression tool for music or other sound types.

speech-synthesis text-to-speech audio-encoding voice-AI machine-learning-research
No License No Package No Dependents
Maintenance 13 / 25
Adoption 5 / 25
Maturity 1 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

Last pushed

Mar 27, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/rishikksh20/voxtral-codec-pytoch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.