stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
This tool helps content creators, voice actors, and marketers refine spoken audio. You input text and define desired emotions, speaking styles, or paralinguistic elements. The output is natural-sounding synthetic speech that precisely conveys the intended tone, ideal for generating expressive voiceovers or dialogue. It also supports zero-shot text-to-speech for various languages.
884 stars. Actively maintained with 1 commit in the last 30 days.
Use this if you need fine-grained control over the emotional tone, speaking style, and specific human sounds (like laughter or sighs) in your synthetic speech or voiceovers.
Not ideal if you're looking for simple, unedited text-to-speech without needing to adjust nuanced emotional or stylistic elements.
Stars
884
Forks
61
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 16, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/stepfun-ai/Step-Audio-EditX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
RaduBolbo/F5-TTS-Emotional-CFG
Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class conditioning built on F5-TTS