daveredrum/ScanRefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
This project helps locate specific objects within 3D scans of indoor environments. You provide a 3D point cloud of a scanned room and a natural language description, like "the small red chair near the window." The system outputs the precise 3D bounding box around that described object. This tool is useful for researchers and engineers working with 3D scene understanding and object interaction.
295 stars. No commits in the last 6 months.
Use this if you need to programmatically identify and pinpoint objects in 3D scanned scenes using human-readable descriptions.
Not ideal if your input data consists of traditional 2D images or if you don't require highly precise 3D localization.
Stars
295
Forks
32
Language
Python
License
—
Category
Last pushed
Feb 10, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/daveredrum/ScanRefer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
clairecyq/whos-waldo
Who's Waldo? Linking People Across Text and Images. ICCV 2021.
TheShadow29/vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description...
jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D...