ivanvovk/durian-pytorch

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" paper in PyTorch.

48
/ 100
Emerging

This project helps create lifelike artificial voices by converting written text into spoken audio. It takes phonemized and duration-aligned text data as input and produces high-quality speech spectrograms, which can then be turned into audio. Voice synthesis researchers, speech product developers, and content creators looking to generate synthetic speech would find this useful.

184 stars. No commits in the last 6 months.

Use this if you need a robust encoder-decoder architecture for text-to-speech synthesis that leverages phoneme duration information for more natural-sounding output.

Not ideal if you lack duration-aligned datasets, as preparing one can be a complex preliminary step, though a pre-trained duration model is provided.

Speech Synthesis Text-to-Speech Voice Generation Audio Content Creation Acoustic Modeling
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

184

Forks

48

Language

Python

License

BSD-3-Clause

Last pushed

Aug 12, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/ivanvovk/durian-pytorch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.