keonlee9420/DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

/ 100

Emerging

DiffGAN-TTS helps creators, educators, and content producers transform written text into high-quality, natural-sounding spoken audio. You input text, and it generates audio files of a single speaker or multiple speakers, with options to control elements like pitch and speaking rate. This is ideal for anyone who needs to quickly create voiceovers or spoken content from text.

347 stars. No commits in the last 6 months.

Use this if you need to generate realistic, high-fidelity speech from text for single or multiple speakers, with some control over vocal characteristics.

Not ideal if you require real-time speech synthesis for interactive applications, as this is geared towards generating audio files.

text-to-speech voice-generation audiobook-creation elearning-content content-localization

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

347

Forks

Language

Python

License

MIT

Compare

DiffGAN-TTS and DiffSinger

Higher-rated alternatives

PrunaAI/pruna

Pruna is a model optimization framework built for developers, enabling you to deliver faster,...

bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

haoheliu/AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Text-to-Audio/Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

teticio/audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...

Explore Diffusion Models

All categories Trending Diffusion directory Insights