zhenye234/FlashSpeech

ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

29
/ 100
Experimental

FlashSpeech helps researchers and developers create custom speech synthesis models from scratch, even with limited data. It takes raw audio data (like speech recordings) along with specific features extracted from it (pitch, phonetic codes, phonemes, and durations) and outputs a trained model that can generate new, natural-sounding speech from text. This is primarily for advanced researchers or machine learning engineers working on cutting-edge voice generation.

155 stars. No commits in the last 6 months.

Use this if you are an AI researcher or developer looking to train a highly efficient, zero-shot speech synthesis model using your own specialized audio datasets.

Not ideal if you need an out-of-the-box solution for generating speech or are not comfortable with advanced machine learning model training and data preparation.

speech-synthesis voice-generation audio-research machine-learning-engineering zero-shot-learning
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 11 / 25

How are scores calculated?

Stars

155

Forks

11

Language

Python

License

Last pushed

Sep 20, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/zhenye234/FlashSpeech"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.