Tanveer81/ReVisionLLM
This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
This project helps video analysts, content creators, or researchers quickly find specific events within very long videos, even those hours in length. You provide a long video and a text description of what you're looking for, and it precisely identifies the start and end times of that event. It's designed for anyone who needs to pinpoint exact moments in extensive video footage without manually scrubbing through everything.
Use this if you need to precisely locate specific events or actions described by text within videos that can be several minutes to many hours long.
Not ideal if your videos are very short (a few seconds) or if you only need to identify broad categories of content rather than specific temporal boundaries.
Stars
43
Forks
2
Language
Python
License
—
Category
Last pushed
Nov 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Tanveer81/ReVisionLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice