THU-SI/Spatial-MLLM

[NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

46
/ 100
Emerging

This project helps anyone needing to understand and reason about the spatial relationships within video footage. It takes video input and can identify objects, their positions, and how they interact in space, outputting accurate answers to spatial questions. This is ideal for professionals in fields like surveillance, robotics, or video analysis who need detailed spatial intelligence from visual data.

447 stars.

Use this if you need to accurately extract and reason about spatial information from video recordings to understand complex scene layouts or object interactions.

Not ideal if your primary need is general object recognition or activity detection without a strong emphasis on precise spatial understanding and reasoning.

video-analysis robotics-navigation surveillance visual-intelligence scene-understanding
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 11 / 25

How are scores calculated?

Stars

447

Forks

17

Language

Python

License

MIT

Last pushed

Feb 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/THU-SI/Spatial-MLLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.