Rongjiehuang/ProDiff
PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline
This tool helps create high-quality, natural-sounding speech from written text quickly. You provide the text you want to be spoken, and it generates an audio file of that text being read aloud. This is ideal for content creators, educators, or businesses needing to convert scripts into spoken audio for various applications.
432 stars. No commits in the last 6 months.
Use this if you need to rapidly produce clear, high-fidelity spoken audio from text for commercial or personal use.
Not ideal if you require fine-grained control over vocal nuances like emotion, specific accents, or very unique voice characteristics beyond standard text-to-speech.
Stars
432
Forks
52
Language
Python
License
MIT
Category
Last pushed
Apr 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Rongjiehuang/ProDiff"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...