rishikksh20/voxtral-codec-pytoch

Voxtral Codec : Combining Semantic VQ and Acoustic FSQ for Ultra-Low Bitrate Speech Generation (Voxtral TTS Backbone)

/ 100

Experimental

This project helps create highly compressed digital representations of human speech, optimized for text-to-speech (TTS) systems. It takes 24 kHz mono speech recordings as input and outputs a stream of ultra-low bitrate discrete codes that preserve both the semantic meaning and acoustic characteristics of the original speech. Speech synthesis researchers and developers would use this to build more efficient and higher-quality TTS models.

Use this if you are a speech synthesis researcher or developer building text-to-speech systems and need an efficient way to convert raw speech audio into discrete, low-bitrate codes for model training.

Not ideal if you need a pre-trained, ready-to-use text-to-speech model or are looking for a general-purpose audio compression tool for music or other sound types.

speech-synthesis text-to-speech audio-encoding voice-AI machine-learning-research

No License No Package No Dependents

Maintenance 13 / 25

Adoption 5 / 25

Maturity 1 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

kan-bayashi/ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

fatchord/WaveRNN

WaveRNN Vocoder + TTS

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation,...

rishikksh20/iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier...

seungwonpark/melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Explore Voice AI Tools

All categories Trending Voice AI directory Insights