zhenye234/FlashSpeech
ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech helps researchers and developers create custom speech synthesis models from scratch, even with limited data. It takes raw audio data (like speech recordings) along with specific features extracted from it (pitch, phonetic codes, phonemes, and durations) and outputs a trained model that can generate new, natural-sounding speech from text. This is primarily for advanced researchers or machine learning engineers working on cutting-edge voice generation.
155 stars. No commits in the last 6 months.
Use this if you are an AI researcher or developer looking to train a highly efficient, zero-shot speech synthesis model using your own specialized audio datasets.
Not ideal if you need an out-of-the-box solution for generating speech or are not comfortable with advanced machine learning model training and data preparation.
Stars
155
Forks
11
Language
Python
License
—
Category
Last pushed
Sep 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/zhenye234/FlashSpeech"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System