Deepest-Project/MelNet

Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"

/ 100

Emerging

This project helps audio engineers and researchers generate realistic audio by transforming frequency-domain representations (mel spectrograms) into audible sound. You provide training data, which can be raw audio datasets, and it produces new, synthesized audio. It's designed for someone working with speech synthesis or exploring novel sound generation.

210 stars. No commits in the last 6 months.

Use this if you need to generate high-quality, synthetic audio from scratch or from text prompts, particularly for research in speech synthesis or creative sound design.

Not ideal if you need to extend or complete existing audio (primed generation) or if you are looking for a simple, out-of-the-box solution without deep customization.

audio-synthesis speech-generation sound-design voice-cloning generative-audio

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

210

Forks

Language

Python

License

MIT

Higher-rated alternatives

kan-bayashi/ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

fatchord/WaveRNN

WaveRNN Vocoder + TTS

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation,...

rishikksh20/iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier...

seungwonpark/melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Explore Voice AI Tools

All categories Trending Voice AI directory Insights