stepfun-ai/Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

/ 100

Established

This tool helps content creators, voice actors, and marketers refine spoken audio. You input text and define desired emotions, speaking styles, or paralinguistic elements. The output is natural-sounding synthetic speech that precisely conveys the intended tone, ideal for generating expressive voiceovers or dialogue. It also supports zero-shot text-to-speech for various languages.

884 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you need fine-grained control over the emotional tone, speaking style, and specific human sounds (like laughter or sighs) in your synthetic speech or voiceovers.

Not ideal if you're looking for simple, unedited text-to-speech without needing to adjust nuanced emotional or stylistic elements.

audio-production voiceover-creation digital-content-creation marketing-audio e-learning-narration

No Package No Dependents

Maintenance 16 / 25

Adoption 10 / 25

Maturity 13 / 25

Community 15 / 25

How are scores calculated?

Stars

884

Forks

Language

Python

License

Apache-2.0

Related tools

index-tts/index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

lucasnewman/f5-tts-mlx

Implementation of F5-TTS in MLX

unilight/seq2seq-vc

A sequence-to-sequence voice conversion toolkit.

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

RaduBolbo/F5-TTS-Emotional-CFG

Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class conditioning built on F5-TTS

Explore Voice AI Tools

All categories Trending Voice AI directory Insights