InternLM/Spatial-SSRL

[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"

/ 100

Emerging

This project helps improve how Large Vision-Language Models (LVLMs) understand the spatial relationships between objects in images and videos. You input ordinary images or video clips, and the project enhances the model's ability to accurately describe locations, sizes, and relative positions without needing special labels. This is ideal for researchers and developers building or evaluating advanced AI vision systems.

116 stars.

Use this if you are a researcher or developer working with LVLMs and need to significantly boost their spatial reasoning capabilities efficiently and without extensive manual data annotation.

Not ideal if you need a plug-and-play application for immediate end-user tasks, as this is a framework for improving underlying model intelligence.

AI model training computer vision research spatial reasoning large vision language models robotics perception

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 13 / 25

Community 7 / 25

How are scores calculated?

Stars

116

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

col14m/cadrille

[ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

filaPro/cad-recode

[ICCV2025] CAD-Recode: Reverse Engineering CAD Code from Point Clouds

pengsongyou/openscene

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

worldbench/3EED

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

cambrian-mllm/cambrian-s

Cambrian-S: Towards Spatial Supersensing in Video

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights