sauradip/STALE

[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "

/ 100

Experimental

This project helps video analysts and content managers automatically find and label specific actions within videos, even if they've never seen that exact action before. You input raw video footage and a text description of the actions you're looking for, and it outputs the precise start and end times of those actions in the video. This is ideal for anyone who needs to quickly pinpoint events in large video archives.

113 stars. No commits in the last 6 months.

Use this if you need to detect a wide range of actions in video content without manually training a new model for every single action.

Not ideal if you primarily need to classify entire videos rather than pinpointing specific action segments within them.

video-analysis content-moderation activity-detection media-management surveillance

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

113

Forks

Language

Python

License

—

Higher-rated alternatives

NVlabs/MambaVision

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

sign-language-translator/sign-language-translator

Python library & framework to build custom translators for the hearing-impaired and translate...

kyegomez/Jamba

PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"

autonomousvision/transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...

kyegomez/MultiModalMamba

A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...

Explore Transformer Models

All categories Trending Transformer directory Insights