Wangbiao2/R1-Track
R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
This tool helps you automatically track a specific object across a video, even if it moves, changes size, or is temporarily obscured. You provide the initial location of the target in the first frame, either by drawing a box around it or describing it with text. The system then outputs the exact coordinates of that object in every subsequent frame. It's ideal for anyone who needs to monitor individual subjects in visual recordings, like in security analysis or sports tracking.
No commits in the last 6 months.
Use this if you need to continuously follow a single object throughout a video sequence and automatically get its precise location in each frame.
Not ideal if you need to track multiple objects simultaneously or detect new objects appearing in the video.
Stars
66
Forks
5
Language
Python
License
MIT
Category
Last pushed
May 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Wangbiao2/R1-Track"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice