mbzuai-oryx/LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
LLMVoX helps create highly responsive, voice-based conversational AI systems. It takes text outputs from any Large Language Model (LLM) or Vision-Language Model and instantly converts them into natural-sounding speech, allowing for real-time spoken dialogues. This is ideal for developers building interactive voice agents, virtual assistants, or any application requiring an LLM to "speak" quickly and clearly.
299 stars. No commits in the last 6 months.
Use this if you are a developer looking to integrate high-quality, low-latency streaming speech generation into your Large Language Model applications without needing to fine-tune the LLM itself.
Not ideal if you need a non-streaming, batch text-to-speech solution or if you don't have access to modern GPU hardware.
Stars
299
Forks
40
Language
Python
License
—
Category
Last pushed
May 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mbzuai-oryx/LLMVoX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
edwko/OuteTTS
Interface for OuteTTS models.
fluxions-ai/vui
100M parameter lightweight conversational text-to-speech model with breaths, laughter,...
OpenVoiceOS/ovos-audio-transformer-plugin-ggwave
data over sound plugin
inboxpraveen/LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates...
Aratako/T5Gemma-TTS
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM