Tanveer81/ReVisionLLM

This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

/ 100

Emerging

This project helps video analysts, content creators, or researchers quickly find specific events within very long videos, even those hours in length. You provide a long video and a text description of what you're looking for, and it precisely identifies the start and end times of that event. It's designed for anyone who needs to pinpoint exact moments in extensive video footage without manually scrubbing through everything.

Use this if you need to precisely locate specific events or actions described by text within videos that can be several minutes to many hours long.

Not ideal if your videos are very short (a few seconds) or if you only need to identify broad categories of content rather than specific temporal boundaries.

video-analysis content-moderation footage-review multimedia-search event-detection

No Package No Dependents

Maintenance 6 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights