zhenye234/CoMoSpeech

ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

/ 100

Emerging

This project helps create natural-sounding spoken audio or singing voices from written text. You provide the words you want to be spoken or sung, and it quickly generates high-quality audio files. This is ideal for content creators, audiobook producers, game developers, or anyone needing realistic text-to-speech or singing voice generation.

213 stars. No commits in the last 6 months.

Use this if you need to rapidly convert text into high-quality, natural-sounding speech or singing, even for large volumes of content.

Not ideal if you need to customize individual vocal nuances like emotion, specific intonation, or unique vocal characteristics beyond the base model's capabilities.

text-to-speech audiobook production voice-overs singing voice synthesis content creation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

213

Forks

Language

Python

License

MIT

Higher-rated alternatives

PrunaAI/pruna

Pruna is a model optimization framework built for developers, enabling you to deliver faster,...

bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

haoheliu/AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Text-to-Audio/Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

teticio/audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...

Explore Diffusion Models

All categories Trending Diffusion directory Insights