herimor/voxtream
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control
This tool helps you quickly turn written text into natural-sounding speech in any voice you provide, even allowing you to adjust the speaking speed in real-time as the audio is being generated. You input a short audio clip of the voice you want to use and the text you want spoken, and it outputs the spoken audio. This is ideal for professionals who need to generate dynamic, lifelike speech for applications like virtual assistants, voiceovers, or interactive spoken interfaces.
210 stars. Available on PyPI.
Use this if you need to generate high-quality, streaming speech instantly, want fine-grained control over the speaking rate, and can provide a short audio example of the desired voice.
Not ideal if you need to generate very long audio clips (over 1 minute) or don't have access to the necessary computational resources (a consumer GPU).
Stars
210
Forks
24
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 17, 2026
Commits (30d)
0
Dependencies
19
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/herimor/voxtream"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
EveryVoiceTTS/EveryVoice
The EveryVoice TTS Toolkit - Text To Speech for your language
thorstenMueller/Thorsten-Voice
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be...
daswer123/xtts-webui
Webui for using XTTS and for finetuning it
kadirnar/VoiceHub
VoiceHub: A Unified Inference Interface for TTS Models
skshadan/TTS-RVC-API
Text to Speech using Coqui TTS + RVC