ivanvovk/WaveGrad
Implementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.
This project helps generate high-quality, natural-sounding speech from mel-spectrograms, which are visual representations of audio. It takes your pre-processed speech data (as mel-spectrograms) and outputs audio waveforms, useful for tasks like text-to-speech systems or voice synthesis. Anyone working with audio synthesis or generating lifelike voices from spectral data, such as researchers in speech technology or developers building voice assistants, would find this valuable.
408 stars. No commits in the last 6 months.
Use this if you need to convert mel-spectrograms into high-fidelity audio waveforms efficiently, especially when quick generation with fewer computational steps is important.
Not ideal if you are looking for an all-in-one text-to-speech solution that handles both text processing and audio generation, as this tool focuses specifically on the vocoder step.
Stars
408
Forks
53
Language
Jupyter Notebook
License
BSD-3-Clause
Category
Last pushed
Jul 07, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/ivanvovk/WaveGrad"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...