KinglittleQ/GST-Tacotron

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

48
/ 100
Emerging

This project helps creators and developers generate natural-sounding speech from Chinese text, giving them control over the style and emotion of the spoken output. You input Chinese text and it synthesizes high-quality audio that can express different 'styles' (like happy, sad, or formal) even if those styles weren't explicitly labeled in the training data. This is useful for anyone creating audio content, such as voiceovers for videos, audiobooks, or interactive voice assistants.

374 stars. No commits in the last 6 months.

Use this if you need to convert Chinese text into speech with nuanced control over the vocal style, without needing to manually label specific emotions or speaking styles.

Not ideal if you primarily work with languages other than Chinese, or if you need a pre-built, production-ready speech synthesis service without any development or training overhead.

speech-synthesis voice-generation content-creation text-to-speech audio-production
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

374

Forks

71

Language

Python

License

MIT

Last pushed

Dec 08, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/KinglittleQ/GST-Tacotron"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.