WWWWxp/M3-TTS

Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"

26
/ 100
Experimental

This project helps you create high-quality, natural-sounding speech from text, even for voices you've never used before. You provide text and, optionally, a short audio sample of a voice, and it generates spoken audio in that voice. It's ideal for content creators, audiobook producers, or anyone needing realistic voiceovers without extensive recording.

118 stars.

Use this if you need to generate high-fidelity speech from text in a wide variety of voices, including new ones, without needing to record extensive custom voice data.

Not ideal if you require highly specific control over subtle speech nuances, emotions, or unique vocalizations that go beyond standard text-to-speech capabilities.

voiceover production audio content creation speech synthesis digital narration text-to-speech
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 5 / 25
Community 5 / 25

How are scores calculated?

Stars

118

Forks

3

Language

Python

License

Last pushed

Dec 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/WWWWxp/M3-TTS"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.