KinglittleQ/GST-Tacotron
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
This project helps creators and developers generate natural-sounding speech from Chinese text, giving them control over the style and emotion of the spoken output. You input Chinese text and it synthesizes high-quality audio that can express different 'styles' (like happy, sad, or formal) even if those styles weren't explicitly labeled in the training data. This is useful for anyone creating audio content, such as voiceovers for videos, audiobooks, or interactive voice assistants.
374 stars. No commits in the last 6 months.
Use this if you need to convert Chinese text into speech with nuanced control over the vocal style, without needing to manually label specific emotions or speaking styles.
Not ideal if you primarily work with languages other than Chinese, or if you need a pre-built, production-ready speech synthesis service without any development or training overhead.
Stars
374
Forks
71
Language
Python
License
MIT
Category
Last pushed
Dec 08, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/KinglittleQ/GST-Tacotron"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
bshall/Tacotron
A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Kyubyong/dc_tts
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
DemisEom/SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Rayhane-mamah/Tacotron-2
DeepMind's Tacotron-2 Tensorflow implementation
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model