bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

54
/ 100
Established

This project offers tools to build or use advanced AI models that can understand and generate text from various types of input, including audio and video. It helps with tasks like creating detailed captions for videos, answering questions about video content, or evaluating the quality of spoken audio. People who need to process and interpret complex multimedia data for tasks such as content analysis, media management, or accessibility will find this useful.

1,392 stars.

Use this if you need to develop or implement AI systems that can accurately process and respond to information presented in video, audio, and text formats.

Not ideal if you are looking for a simple, off-the-shelf application for basic text-only processing or image recognition.

video-captioning audio-analysis multimedia-content-understanding speech-quality-assessment AI-development
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

1,392

Forks

112

Language

License

Apache-2.0

Last pushed

Feb 03, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bytedance/SALMONN"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.