NN-Project-2/Emotion-TTS-Emebddings

This project explores zero-shot emotional speech synthesis using EMOD, a novel approach combining emotion and content embeddings for multilingual and cross-lingual emotion transfer. Built on a VITS-based TTS model, it preserves speaker identity while enhancing expressiveness, enabling emotion transfer across languages and genders efficiently.

32
/ 100
Emerging

This project helps create speech that sounds natural and expressive in various emotional tones, even across different languages and voices. It takes a piece of text and an emotional style (extracted from an audio sample) and generates speech that matches that emotion while keeping the original speaker's voice. Voice artists, content creators, or anyone needing highly customized, emotionally nuanced synthesized speech for diverse audiences would find this useful.

Use this if you need to generate high-quality, emotionally expressive synthesized speech and want to control the intensity and type of emotion independently of the speaker and language.

Not ideal if you only need basic, unemotional text-to-speech without fine-grained control over emotional expression or cross-lingual emotion transfer.

voice-synthesis audio-content-creation localization digital-voice-acting expressive-narration
No License No Package No Dependents
Maintenance 6 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 12 / 25

How are scores calculated?

Stars

18

Forks

3

Language

Python

License

Last pushed

Dec 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/NN-Project-2/Emotion-TTS-Emebddings"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.