Wangbiao2/R1-Track

R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.

/ 100

Emerging

This tool helps you automatically track a specific object across a video, even if it moves, changes size, or is temporarily obscured. You provide the initial location of the target in the first frame, either by drawing a box around it or describing it with text. The system then outputs the exact coordinates of that object in every subsequent frame. It's ideal for anyone who needs to monitor individual subjects in visual recordings, like in security analysis or sports tracking.

No commits in the last 6 months.

Use this if you need to continuously follow a single object throughout a video sequence and automatically get its precise location in each frame.

Not ideal if you need to track multiple objects simultaneously or detect new objects appearing in the video.

video-surveillance motion-tracking object-localization visual-analytics sports-analysis

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights