Multimodal Visual Grounding Computer Vision Tools

There are 8 multimodal visual grounding tools tracked. 1 score above 50 (established tier). The highest-rated is peteanderson80/Matterport3DSimulator at 51/100 with 683 stars.

Get all 8 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-visual-grounding&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 peteanderson80/Matterport3DSimulator

AI Research Platform for Reinforcement Learning from Real Panoramic Images.

51
Established
2 daveredrum/ScanRefer

[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

41
Emerging
3 cambridgeltl/visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of...

38
Emerging
4 clairecyq/whos-waldo

Who's Waldo? Linking People Across Text and Images. ICCV 2021.

36
Emerging
5 TheShadow29/vognet-pytorch

[CVPR20] Video Object Grounding using Semantic Roles in Language Description...

35
Emerging
6 jianghaojun/Awesome-3D-Vision-and-Language

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D...

32
Emerging
7 ChenBarryHu/TransformerVG

TransformerVG - 3D Visual Grounding with Transformers

18
Experimental
8 fpsluozi/tofindwaldo

Official Repo for "To Find Waldo You Need Contextual Cues: Debiasing Who’s...

12
Experimental