ivanvovk/durian-pytorch

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" paper in PyTorch.

/ 100

Emerging

This project helps create lifelike artificial voices by converting written text into spoken audio. It takes phonemized and duration-aligned text data as input and produces high-quality speech spectrograms, which can then be turned into audio. Voice synthesis researchers, speech product developers, and content creators looking to generate synthetic speech would find this useful.

184 stars. No commits in the last 6 months.

Use this if you need a robust encoder-decoder architecture for text-to-speech synthesis that leverages phoneme duration information for more natural-sounding output.

Not ideal if you lack duration-aligned datasets, as preparing one can be a complex preliminary step, though a pre-trained duration model is provided.

Speech Synthesis Text-to-Speech Voice Generation Audio Content Creation Acoustic Modeling

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

184

Forks

Language

Python

License

BSD-3-Clause

Higher-rated alternatives

bshall/Tacotron

A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

DemisEom/SpecAugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

Rayhane-mamah/Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

Kyubyong/tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Explore Voice AI Tools

All categories Trending Voice AI directory Insights