ruohaoguo/ovavss

Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].

21
/ 100
Experimental

This project helps video editors, content creators, or media analysts identify and categorize every sound-producing object in a video, even if they've never seen or heard that specific type of object before. You provide a video, and it outputs a segmented video where each sounding object (like a dog barking, a car engine, or a person speaking) is highlighted and labeled. This is useful for anyone needing to precisely isolate or understand the auditory and visual components of complex video scenes.

No commits in the last 6 months.

Use this if you need to accurately identify and segment all sounding objects in videos, including those not explicitly trained on.

Not ideal if your primary goal is simple object detection or if you only need to segment a pre-defined, small set of known objects.

video-analysis content-moderation media-production scene-understanding sound-source-localization
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 6 / 25

How are scores calculated?

Stars

35

Forks

2

Language

Python

License

Last pushed

Nov 02, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/ruohaoguo/ovavss"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.