keonlee9420/DiffGAN-TTS
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
DiffGAN-TTS helps creators, educators, and content producers transform written text into high-quality, natural-sounding spoken audio. You input text, and it generates audio files of a single speaker or multiple speakers, with options to control elements like pitch and speaking rate. This is ideal for anyone who needs to quickly create voiceovers or spoken content from text.
347 stars. No commits in the last 6 months.
Use this if you need to generate realistic, high-fidelity speech from text for single or multiple speakers, with some control over vocal characteristics.
Not ideal if you require real-time speech synthesis for interactive applications, as this is geared towards generating audio files.
Stars
347
Forks
44
Language
Python
License
MIT
Category
Last pushed
Feb 21, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/keonlee9420/DiffGAN-TTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...