Multimodal Visual Grounding Computer Vision Tools

There are 8 multimodal visual grounding tools tracked. 1 score above 50 (established tier). The highest-rated is peteanderson80/Matterport3DSimulator at 51/100 with 683 stars.

Get all 8 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-visual-grounding&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	peteanderson80/Matterport3DSimulator AI Research Platform for Reinforcement Learning from Real Panoramic Images.	51	Established	683	C++
2	daveredrum/ScanRefer [ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	41	Emerging	295	Python
3	cambridgeltl/visual-spatial-reasoning [TACL'23] VSR: A probing benchmark for spatial undersranding of...	38	Emerging	140	Python
4	clairecyq/whos-waldo Who's Waldo? Linking People Across Text and Images. ICCV 2021.	36	Emerging	13	Python
5	TheShadow29/vognet-pytorch [CVPR20] Video Object Grounding using Semantic Roles in Language Description...	35	Emerging	69	Python
6	jianghaojun/Awesome-3D-Vision-and-Language A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D...	32	Emerging	101	—
7	ChenBarryHu/TransformerVG TransformerVG - 3D Visual Grounding with Transformers	18	Experimental	2	Python
8	fpsluozi/tofindwaldo Official Repo for "To Find Waldo You Need Contextual Cues: Debiasing Who’s...	12	Experimental	7	—