TheShadow29/VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

/ 100

Emerging

VidSitu helps researchers analyze short video clips by identifying the verbs, semantic roles, and relationships between events within them. It takes 10-second movie clips as input and outputs detailed annotations of what is happening, who is doing it, and how different actions are connected, all at 2-second intervals. This project is for computer vision researchers and AI model developers working on understanding complex video content.

No commits in the last 6 months.

Use this if you are a computer vision researcher developing or evaluating models for understanding human actions and events in short video segments.

Not ideal if you need a plug-and-play solution for real-time video analysis or for analyzing very long-form video content without segmenting.

video-understanding computer-vision activity-recognition event-analysis AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

microsoft/XPretrain

Multi-modality pre-training

TheShadow29/zsgnet-pytorch

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural...

zeyofu/BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...

gicheonkang/sglkt-visdial

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph...

Explore NLP Tools

All categories Trending NLP directory Insights