mbzuai-oryx/LLMVoX

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

/ 100

Emerging

LLMVoX helps create highly responsive, voice-based conversational AI systems. It takes text outputs from any Large Language Model (LLM) or Vision-Language Model and instantly converts them into natural-sounding speech, allowing for real-time spoken dialogues. This is ideal for developers building interactive voice agents, virtual assistants, or any application requiring an LLM to "speak" quickly and clearly.

299 stars. No commits in the last 6 months.

Use this if you are a developer looking to integrate high-quality, low-latency streaming speech generation into your Large Language Model applications without needing to fine-tune the LLM itself.

Not ideal if you need a non-streaming, batch text-to-speech solution or if you don't have access to modern GPU hardware.

conversational-ai voice-assistants speech-synthesis LLM-integration real-time-audio

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

299

Forks

Language

Python

License

—

Higher-rated alternatives

edwko/OuteTTS

Interface for OuteTTS models.

fluxions-ai/vui

100M parameter lightweight conversational text-to-speech model with breaths, laughter,...

OpenVoiceOS/ovos-audio-transformer-plugin-ggwave

data over sound plugin

inboxpraveen/LLM-Minutes-of-Meeting

🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates...

Aratako/T5Gemma-TTS

Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM

Explore Transformer Models

All categories Trending Transformer directory Insights