fixie-ai/ultravox

A fast multimodal LLM for real-time voice

/ 100

Established

Ultravox helps you build applications that can understand and respond to human speech in real-time, without any noticeable delay. It takes in live audio input and instantly provides a text transcription, making it perfect for interactive voice agents. Developers building voice-enabled tools and platforms would use this to create highly responsive conversational experiences.

4,377 stars.

Use this if you need an AI model that can process live spoken words and immediately output text for extremely fast, natural voice interactions.

Not ideal if your primary need is for offline audio transcription or if you're looking for a model that outputs spoken responses directly, as this currently outputs text.

voice-AI real-time-transcription conversational-AI speech-recognition voice-assistants

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

4,377

Forks

367

Language

Python

License

MIT

Related models

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

bytedance/video-SALMONN-2

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates...

Explore Transformer Models

All categories Trending Transformer directory Insights