jishengpeng/ControlSpeech

[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

/ 100

Experimental

ControlSpeech helps content creators, marketers, or educators generate natural-sounding speech from text. You provide a sample of a speaker's voice and some text describing the desired speaking style (like 'excited' or 'calm'), along with the content you want spoken. The output is an audio file where the provided text is spoken in the cloned voice and specified style, without needing extensive training data for new voices or styles.

275 stars. No commits in the last 6 months.

Use this if you need to quickly create personalized audio content with specific vocal styles and diverse voices from minimal examples.

Not ideal if you require highly nuanced, professional voice acting or need to generate speech with extremely precise emotional or tonal control beyond what can be captured from a brief text prompt.

audio-content-creation voice-cloning text-to-speech digital-narration synthetic-media

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

275

Forks

Language

Python

License

—

Higher-rated alternatives

index-tts/index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

stepfun-ai/Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...

lucasnewman/f5-tts-mlx

Implementation of F5-TTS in MLX

unilight/seq2seq-vc

A sequence-to-sequence voice conversion toolkit.

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Explore Voice AI Tools

All categories Trending Voice AI directory Insights