fixie-ai/ultravox
A fast multimodal LLM for real-time voice
Ultravox helps you build applications that can understand and respond to human speech in real-time, without any noticeable delay. It takes in live audio input and instantly provides a text transcription, making it perfect for interactive voice agents. Developers building voice-enabled tools and platforms would use this to create highly responsive conversational experiences.
4,377 stars.
Use this if you need an AI model that can process live spoken words and immediately output text for extremely fast, natural voice interactions.
Not ideal if your primary need is for offline audio transcription or if you're looking for a model that outputs spoken responses directly, as this currently outputs text.
Stars
4,377
Forks
367
Language
Python
License
MIT
Category
Last pushed
Dec 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/fixie-ai/ultravox"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
bytedance/video-SALMONN-2
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates...