Aratako/T5Gemma-TTS
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM
This project helps content creators, educators, or communicators generate natural-sounding speech from text across English, Chinese, and Japanese. You provide the text you want spoken and optionally a reference audio clip, and it produces an audio file with the spoken content. It's ideal for anyone needing to create custom audio narration or voiceovers with consistent voices, without needing a professional voice actor.
280 stars.
Use this if you need to quickly generate multilingual voiceovers, e-learning content, or spoken narratives, especially when you want to clone a specific voice or control the audio duration.
Not ideal if you need to generate voices in languages other than English, Chinese, or Japanese, or if you require extremely nuanced, emotion-driven vocal performances that go beyond standard voice cloning capabilities.
Stars
280
Forks
29
Language
Python
License
MIT
Category
Last pushed
Dec 23, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Aratako/T5Gemma-TTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
edwko/OuteTTS
Interface for OuteTTS models.
fluxions-ai/vui
100M parameter lightweight conversational text-to-speech model with breaths, laughter,...
OpenVoiceOS/ovos-audio-transformer-plugin-ggwave
data over sound plugin
inboxpraveen/LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates...
mbzuai-oryx/LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM