zhenye234/CoMoSpeech
ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
This project helps create natural-sounding spoken audio or singing voices from written text. You provide the words you want to be spoken or sung, and it quickly generates high-quality audio files. This is ideal for content creators, audiobook producers, game developers, or anyone needing realistic text-to-speech or singing voice generation.
213 stars. No commits in the last 6 months.
Use this if you need to rapidly convert text into high-quality, natural-sounding speech or singing, even for large volumes of content.
Not ideal if you need to customize individual vocal nuances like emotion, specific intonation, or unique vocal characteristics beyond the base model's capabilities.
Stars
213
Forks
22
Language
Python
License
MIT
Category
Last pushed
Apr 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/zhenye234/CoMoSpeech"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...