zhenye234/FlashSpeech

ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

/ 100

Experimental

FlashSpeech helps researchers and developers create custom speech synthesis models from scratch, even with limited data. It takes raw audio data (like speech recordings) along with specific features extracted from it (pitch, phonetic codes, phonemes, and durations) and outputs a trained model that can generate new, natural-sounding speech from text. This is primarily for advanced researchers or machine learning engineers working on cutting-edge voice generation.

155 stars. No commits in the last 6 months.

Use this if you are an AI researcher or developer looking to train a highly efficient, zero-shot speech synthesis model using your own specialized audio datasets.

Not ideal if you need an out-of-the-box solution for generating speech or are not comfortable with advanced machine learning model training and data preparation.

speech-synthesis voice-generation audio-research machine-learning-engineering zero-shot-learning

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

155

Forks

Language

Python

License

—

Higher-rated alternatives

index-tts/index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

stepfun-ai/Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...

lucasnewman/f5-tts-mlx

Implementation of F5-TTS in MLX

unilight/seq2seq-vc

A sequence-to-sequence voice conversion toolkit.

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Explore Voice AI Tools

All categories Trending Voice AI directory Insights