fluxions-ai/vui
100M parameter lightweight conversational text-to-speech model with breaths, laughter, multi-speaker dialogue, voice cloning, and streaming. Llama-based, on-device.
This project helps you turn written text into natural-sounding speech, perfect for conversations. You input text, optionally with tags for non-verbal sounds like breaths or laughs, and it outputs audio that sounds like a human speaking, even supporting multiple speakers and voice cloning. It's ideal for content creators, podcasters, educators, or anyone needing realistic spoken dialogue for their projects.
641 stars.
Use this if you need to generate conversational, human-like speech from text, including non-verbal cues and multi-speaker dialogue, and want it to run efficiently on standard computer hardware.
Not ideal if you require extremely precise control over every vocal nuance for highly sensitive applications, as the model may occasionally produce unexpected speech.
Stars
641
Forks
63
Language
Python
License
MIT
Category
Last pushed
Feb 25, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/fluxions-ai/vui"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
edwko/OuteTTS
Interface for OuteTTS models.
OpenVoiceOS/ovos-audio-transformer-plugin-ggwave
data over sound plugin
inboxpraveen/LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates...
mbzuai-oryx/LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Aratako/T5Gemma-TTS
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM