WWWWxp/M3-TTS

Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"

/ 100

Experimental

This project helps you create high-quality, natural-sounding speech from text, even for voices you've never used before. You provide text and, optionally, a short audio sample of a voice, and it generates spoken audio in that voice. It's ideal for content creators, audiobook producers, or anyone needing realistic voiceovers without extensive recording.

118 stars.

Use this if you need to generate high-fidelity speech from text in a wide variety of voices, including new ones, without needing to record extensive custom voice data.

Not ideal if you require highly specific control over subtle speech nuances, emotions, or unique vocalizations that go beyond standard text-to-speech capabilities.

voiceover production audio content creation speech synthesis digital narration text-to-speech

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 5 / 25

Community 5 / 25

How are scores calculated?

Stars

118

Forks

Language

Python

License

—

Higher-rated alternatives

TensorSpeech/TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for...

lucasnewman/nanospeech

A simple, hackable text-to-speech system in PyTorch and MLX

Tomiinek/Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing,...

keonlee9420/STYLER

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech...

jxzhanggg/nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC

Explore Voice AI Tools

All categories Trending Voice AI directory Insights