ivanvovk/WaveGrad

Implementation of WaveGrad high-fidelity vocoder from Google Brain in PyTorch.

/ 100

Emerging

This project helps generate high-quality, natural-sounding speech from mel-spectrograms, which are visual representations of audio. It takes your pre-processed speech data (as mel-spectrograms) and outputs audio waveforms, useful for tasks like text-to-speech systems or voice synthesis. Anyone working with audio synthesis or generating lifelike voices from spectral data, such as researchers in speech technology or developers building voice assistants, would find this valuable.

408 stars. No commits in the last 6 months.

Use this if you need to convert mel-spectrograms into high-fidelity audio waveforms efficiently, especially when quick generation with fewer computational steps is important.

Not ideal if you are looking for an all-in-one text-to-speech solution that handles both text processing and audio generation, as this tool focuses specifically on the vocoder step.

speech-synthesis voice-generation audio-processing text-to-speech digital-audio

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

408

Forks

Language

Jupyter Notebook

License

BSD-3-Clause

Higher-rated alternatives

PrunaAI/pruna

Pruna is a model optimization framework built for developers, enabling you to deliver faster,...

bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

haoheliu/AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Text-to-Audio/Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

teticio/audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...

Explore Diffusion Models

All categories Trending Diffusion directory Insights