lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

/ 100

Emerging

This tool helps researchers and developers working on speech technology to create discrete 'semantic tokens' from raw audio. It takes unlabeled speech recordings as input and produces meaningful, quantized representations of the audio content. This is particularly useful for those building advanced speech synthesis or recognition systems who need to process audio efficiently.

133 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to transform continuous speech signals into a sequence of discrete, semantically rich tokens for tasks like text-to-speech or automatic speech recognition.

Not ideal if you are looking for a pre-trained, ready-to-use speech recognition or synthesis model without needing to work with intermediate speech representations.

speech-synthesis speech-recognition audio-processing machine-learning-research

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 13 / 25

How are scores calculated?

Stars

133

Forks

Language

Python

License

MIT

Higher-rated alternatives

kan-bayashi/ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

fatchord/WaveRNN

WaveRNN Vocoder + TTS

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation,...

rishikksh20/iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier...

seungwonpark/melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Explore Voice AI Tools

All categories Trending Voice AI directory Insights