jishengpeng/ControlSpeech

[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

29
/ 100
Experimental

ControlSpeech helps content creators, marketers, or educators generate natural-sounding speech from text. You provide a sample of a speaker's voice and some text describing the desired speaking style (like 'excited' or 'calm'), along with the content you want spoken. The output is an audio file where the provided text is spoken in the cloned voice and specified style, without needing extensive training data for new voices or styles.

275 stars. No commits in the last 6 months.

Use this if you need to quickly create personalized audio content with specific vocal styles and diverse voices from minimal examples.

Not ideal if you require highly nuanced, professional voice acting or need to generate speech with extremely precise emotional or tonal control beyond what can be captured from a brief text prompt.

audio-content-creation voice-cloning text-to-speech digital-narration synthetic-media
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 11 / 25

How are scores calculated?

Stars

275

Forks

14

Language

Python

License

Last pushed

Nov 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/jishengpeng/ControlSpeech"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.