sauradip/STALE
[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "
This project helps video analysts and content managers automatically find and label specific actions within videos, even if they've never seen that exact action before. You input raw video footage and a text description of the actions you're looking for, and it outputs the precise start and end times of those actions in the video. This is ideal for anyone who needs to quickly pinpoint events in large video archives.
113 stars. No commits in the last 6 months.
Use this if you need to detect a wide range of actions in video content without manually training a new model for every single action.
Not ideal if you primarily need to classify entire videos rather than pinpointing specific action segments within them.
Stars
113
Forks
11
Language
Python
License
—
Category
Last pushed
Aug 03, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sauradip/STALE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the hearing-impaired and translate...
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
autonomousvision/transfuser
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...