bytedance/Sa2VA

Official Repo For Pixel-LLM Codebase

54
/ 100
Established

This tool helps creative professionals and analysts understand and interact with the content of images and videos. You provide an image or video, along with a natural language instruction or question, and it can identify and highlight specific objects (like 'the girl in the yellow dress') or provide a description of the scene. This is useful for anyone needing to precisely locate elements or extract detailed information from visual media.

1,558 stars.

Use this if you need to precisely segment objects within images or videos based on descriptive text, or if you want to ask questions about visual content and receive detailed, grounded answers.

Not ideal if your primary need is general image classification, simple object detection, or basic video summarization without dense, interactive understanding.

video-analysis image-segmentation content-understanding visual-search media-asset-management
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

1,558

Forks

114

Language

Python

License

Apache-2.0

Last pushed

Feb 27, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bytedance/Sa2VA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.