herimor/voxtream

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control

/ 100

Established

This tool helps you quickly turn written text into natural-sounding speech in any voice you provide, even allowing you to adjust the speaking speed in real-time as the audio is being generated. You input a short audio clip of the voice you want to use and the text you want spoken, and it outputs the spoken audio. This is ideal for professionals who need to generate dynamic, lifelike speech for applications like virtual assistants, voiceovers, or interactive spoken interfaces.

210 stars. Available on PyPI.

Use this if you need to generate high-quality, streaming speech instantly, want fine-grained control over the speaking rate, and can provide a short audio example of the desired voice.

Not ideal if you need to generate very long audio clips (over 1 minute) or don't have access to the necessary computational resources (a consumer GPU).

text-to-speech voice-generation audio-production virtual-assistants interactive-voice-response

Maintenance 13 / 25

Adoption 10 / 25

Maturity 24 / 25

Community 16 / 25

How are scores calculated?

Stars

210

Forks

Language

Python

License

Apache-2.0

Related tools

EveryVoiceTTS/EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language

thorstenMueller/Thorsten-Voice

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be...

daswer123/xtts-webui

Webui for using XTTS and for finetuning it

kadirnar/VoiceHub

VoiceHub: A Unified Inference Interface for TTS Models

skshadan/TTS-RVC-API

Text to Speech using Coqui TTS + RVC

Explore Voice AI Tools

All categories Trending Voice AI directory Insights