jishengpeng/ControlSpeech
[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
ControlSpeech helps content creators, marketers, or educators generate natural-sounding speech from text. You provide a sample of a speaker's voice and some text describing the desired speaking style (like 'excited' or 'calm'), along with the content you want spoken. The output is an audio file where the provided text is spoken in the cloned voice and specified style, without needing extensive training data for new voices or styles.
275 stars. No commits in the last 6 months.
Use this if you need to quickly create personalized audio content with specific vocal styles and diverse voices from minimal examples.
Not ideal if you require highly nuanced, professional voice acting or need to generate speech with extremely precise emotional or tonal control beyond what can be captured from a brief text prompt.
Stars
275
Forks
14
Language
Python
License
—
Category
Last pushed
Nov 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/jishengpeng/ControlSpeech"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System