lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

48
/ 100
Emerging

This tool helps researchers and developers working on speech technology to create discrete 'semantic tokens' from raw audio. It takes unlabeled speech recordings as input and produces meaningful, quantized representations of the audio content. This is particularly useful for those building advanced speech synthesis or recognition systems who need to process audio efficiently.

133 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to transform continuous speech signals into a sequence of discrete, semantically rich tokens for tasks like text-to-speech or automatic speech recognition.

Not ideal if you are looking for a pre-trained, ready-to-use speech recognition or synthesis model without needing to work with intermediate speech representations.

speech-synthesis speech-recognition audio-processing machine-learning-research
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 13 / 25

How are scores calculated?

Stars

133

Forks

12

Language

Python

License

MIT

Last pushed

Sep 25, 2023

Commits (30d)

0

Dependencies

7

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/lucasnewman/best-rq-pytorch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.