NN-Project-2/Emotion-TTS-Emebddings

This project explores zero-shot emotional speech synthesis using EMOD, a novel approach combining emotion and content embeddings for multilingual and cross-lingual emotion transfer. Built on a VITS-based TTS model, it preserves speaker identity while enhancing expressiveness, enabling emotion transfer across languages and genders efficiently.

/ 100

Emerging

This project helps create speech that sounds natural and expressive in various emotional tones, even across different languages and voices. It takes a piece of text and an emotional style (extracted from an audio sample) and generates speech that matches that emotion while keeping the original speaker's voice. Voice artists, content creators, or anyone needing highly customized, emotionally nuanced synthesized speech for diverse audiences would find this useful.

Use this if you need to generate high-quality, emotionally expressive synthesized speech and want to control the intensity and type of emotion independently of the speaker and language.

Not ideal if you only need basic, unemotional text-to-speech without fine-grained control over emotional expression or cross-lingual emotion transfer.

voice-synthesis audio-content-creation localization digital-voice-acting expressive-narration

No License No Package No Dependents

Maintenance 6 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

index-tts/index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

stepfun-ai/Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...

lucasnewman/f5-tts-mlx

Implementation of F5-TTS in MLX

unilight/seq2seq-vc

A sequence-to-sequence voice conversion toolkit.

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Explore Voice AI Tools

All categories Trending Voice AI directory Insights