NN-Project-2/Emotion-TTS-Emebddings
This project explores zero-shot emotional speech synthesis using EMOD, a novel approach combining emotion and content embeddings for multilingual and cross-lingual emotion transfer. Built on a VITS-based TTS model, it preserves speaker identity while enhancing expressiveness, enabling emotion transfer across languages and genders efficiently.
This project helps create speech that sounds natural and expressive in various emotional tones, even across different languages and voices. It takes a piece of text and an emotional style (extracted from an audio sample) and generates speech that matches that emotion while keeping the original speaker's voice. Voice artists, content creators, or anyone needing highly customized, emotionally nuanced synthesized speech for diverse audiences would find this useful.
Use this if you need to generate high-quality, emotionally expressive synthesized speech and want to control the intensity and type of emotion independently of the speaker and language.
Not ideal if you only need basic, unemotional text-to-speech without fine-grained control over emotional expression or cross-lingual emotion transfer.
Stars
18
Forks
3
Language
Python
License
—
Category
Last pushed
Dec 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/NN-Project-2/Emotion-TTS-Emebddings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
index-tts/index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
stepfun-ai/Step-Audio-EditX
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing...
lucasnewman/f5-tts-mlx
Implementation of F5-TTS in MLX
unilight/seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System