ictnlp/LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

/ 100

Emerging

LLaMA-Omni helps you have natural, fast voice conversations with an AI. You speak to the AI, and it quickly understands your speech, generates a text response, and speaks its answer back to you. This is ideal for anyone needing quick, spoken information or interaction, like a customer service agent interacting with a bot or a language learner practicing conversation.

3,128 stars. No commits in the last 6 months.

Use this if you need an AI that can understand spoken questions and respond instantly with both text and high-quality generated speech.

Not ideal if your primary need is for purely text-based AI interaction or if you require an AI for commercial products without obtaining a specific license.

voice-assistants speech-to-text text-to-speech conversational-ai interactive-systems

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

3,128

Forks

222

Language

Python

License

Apache-2.0

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights