herimor/voxtream

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control

63
/ 100
Established

This tool helps you quickly turn written text into natural-sounding speech in any voice you provide, even allowing you to adjust the speaking speed in real-time as the audio is being generated. You input a short audio clip of the voice you want to use and the text you want spoken, and it outputs the spoken audio. This is ideal for professionals who need to generate dynamic, lifelike speech for applications like virtual assistants, voiceovers, or interactive spoken interfaces.

210 stars. Available on PyPI.

Use this if you need to generate high-quality, streaming speech instantly, want fine-grained control over the speaking rate, and can provide a short audio example of the desired voice.

Not ideal if you need to generate very long audio clips (over 1 minute) or don't have access to the necessary computational resources (a consumer GPU).

text-to-speech voice-generation audio-production virtual-assistants interactive-voice-response
Maintenance 13 / 25
Adoption 10 / 25
Maturity 24 / 25
Community 16 / 25

How are scores calculated?

Stars

210

Forks

24

Language

Python

License

Apache-2.0

Last pushed

Mar 17, 2026

Commits (30d)

0

Dependencies

19

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/herimor/voxtream"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.