Reagan1311/Mask2IV
Mask2IV: Interaction-Centric Video Generation via Mask Trajectories (AAAI 2026)
This project helps robotics engineers and researchers create realistic videos of robots or humans interacting with objects. You provide an action description (e.g., "pick up the cup") or spatial cues, and it generates a video depicting that interaction. This tool is for those developing robot learning, manipulation policies, or affordance reasoning applications.
Use this if you need to generate diverse, controllable video data of interactions for training and evaluating embodied intelligence systems, without manually providing detailed mask annotations.
Not ideal if you need to generate general-purpose videos that don't involve specific actor-object interactions, or if you are looking for a simple, off-the-shelf video editing tool.
Stars
11
Forks
1
Language
Python
License
MIT
Category
Last pushed
Jan 30, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Reagan1311/Mask2IV"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hao-ai-lab/FastVideo
A unified inference and post-training framework for accelerated video generation.
ModelTC/LightX2V
Light Image Video Generation Inference Framework
thu-ml/TurboDiffusion
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
PKU-YuanGroup/Helios
Helios: Real Real-Time Long Video Generation Model
PKU-YuanGroup/MagicTime
[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators