Honee-W/U-SAM

Official repository for U-SAM (Interspeech 2025)

/ 100

Emerging

U-SAM helps researchers and developers working with audio by providing a single system to understand speech, general audio, and music. It takes raw audio inputs and can interpret what's happening within those sounds, allowing for a wide range of audio-language applications. This is for anyone creating tools or systems that need to process and understand diverse types of audio content.

No commits in the last 6 months.

Use this if you are developing applications that need to interpret or categorize different types of audio, from spoken words to musical compositions and environmental sounds.

Not ideal if you are an end-user looking for a ready-to-use application; this is a foundational model for developers to build upon.

audio-analysis speech-recognition music-information-retrieval sound-event-detection audio-language-processing

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Explore Transformer Models

All categories Trending Transformer directory Insights