ivanvovk/durian-pytorch
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" paper in PyTorch.
This project helps create lifelike artificial voices by converting written text into spoken audio. It takes phonemized and duration-aligned text data as input and produces high-quality speech spectrograms, which can then be turned into audio. Voice synthesis researchers, speech product developers, and content creators looking to generate synthetic speech would find this useful.
184 stars. No commits in the last 6 months.
Use this if you need a robust encoder-decoder architecture for text-to-speech synthesis that leverages phoneme duration information for more natural-sounding output.
Not ideal if you lack duration-aligned datasets, as preparing one can be a complex preliminary step, though a pre-trained duration model is provided.
Stars
184
Forks
48
Language
Python
License
BSD-3-Clause
Category
Last pushed
Aug 12, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/ivanvovk/durian-pytorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
bshall/Tacotron
A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Kyubyong/dc_tts
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
DemisEom/SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Rayhane-mamah/Tacotron-2
DeepMind's Tacotron-2 Tensorflow implementation
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model