YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
This project offers an advanced AI model that can listen to audio and speech, then answer complex, open-ended questions about what it "hears." It takes audio or speech recordings as input and produces natural language answers to your questions, effectively bridging the gap between sound perception and understanding. This is ideal for researchers or developers working with audio analysis and conversational AI.
472 stars. No commits in the last 6 months.
Use this if you need to understand the content and context of audio or speech recordings by asking natural language questions and receiving detailed answers.
Not ideal if you only need basic transcriptions or simple classification of audio events without requiring deep, conversational understanding.
Stars
472
Forks
41
Language
Python
License
—
Category
Last pushed
Apr 24, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/YuanGongND/ltu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jncraton/languagemodels
Explore large language models in 512MB of RAM
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
haizelabs/verdict
Inference-time scaling for LLMs-as-a-judge.
albertan017/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
bytedance/Sa2VA
Official Repo For Pixel-LLM Codebase