WWWWxp/M3-TTS
Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"
This project helps you create high-quality, natural-sounding speech from text, even for voices you've never used before. You provide text and, optionally, a short audio sample of a voice, and it generates spoken audio in that voice. It's ideal for content creators, audiobook producers, or anyone needing realistic voiceovers without extensive recording.
118 stars.
Use this if you need to generate high-fidelity speech from text in a wide variety of voices, including new ones, without needing to record extensive custom voice data.
Not ideal if you require highly specific control over subtle speech nuances, emotions, or unique vocalizations that go beyond standard text-to-speech capabilities.
Stars
118
Forks
3
Language
Python
License
—
Category
Last pushed
Dec 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/WWWWxp/M3-TTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TensorSpeech/TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for...
lucasnewman/nanospeech
A simple, hackable text-to-speech system in PyTorch and MLX
Tomiinek/Multilingual_Text_to_Speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing,...
keonlee9420/STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech...
jxzhanggg/nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC