Multimodal Vision Language Computer Vision Tools

There are 22 multimodal vision language tools tracked. The highest-rated is col14m/cadrille at 49/100 with 110 stars.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-vision-language&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	col14m/cadrille [ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online...	49	Emerging	110	Python
2	filaPro/cad-recode [ICCV2025] CAD-Recode: Reverse Engineering CAD Code from Point Clouds	48	Emerging	206	Jupyter Notebook
3	pengsongyou/openscene [CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies	43	Emerging	800	Python
4	worldbench/3EED [NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D	42	Emerging	206	Python
5	cambrian-mllm/cambrian-s Cambrian-S: Towards Spatial Supersensing in Video	42	Emerging	507	Python
6	Gorilla-Lab-SCUT/PaDT [ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards...	42	Emerging	251	Python
7	InternLM/Spatial-SSRL [CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial...	40	Emerging	116	Python
8	Davidyao99/uni4d [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a...	39	Emerging	222	Python
9	TimeBlindness/time-blindness [CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can?	38	Emerging	62	Python
10	bagh2178/UniGoal [CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation	38	Emerging	311	Python
11	Haochen-Wang409/TreeVGR [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation...	38	Emerging	77	Python
12	ajzhai/NeRF2Physics [CVPR 2024] Physical Property Understanding from Language-Embedded Feature Fields	38	Emerging	89	Python
13	IDEA-Research/RexSeek [ICCV2025] Referring any person or objects given a natural language...	38	Emerging	177	Python
14	Sid2697/HOI-Ref Code implementation for paper titled "HOI-Ref: Hand-Object Interaction...	32	Emerging	29	Python
15	taco-group/SparkVSR SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation	29	Experimental	29	Python
16	Haochen-Wang409/ross3d [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness	29	Experimental	67	Python
17	Jiaxuan-Li/EVCap [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name...	28	Experimental	62	Python
18	fz-zsl/QuatRoPE The official implementation for CVPR 2026 paper Scalable Object Relation...	24	Experimental	3	Python
19	Hon-Wong/Elysium [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM	24	Experimental	86	Python
20	BaohaoLiao/road [NeurIPS 2024] 3-in-1: 2D Rotary Adaptation for Efficient Finetuning,...	24	Experimental	2	—
21	sled-group/3D-GRAND [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs	21	Experimental	53	—
22	AIoT-MLSys-Lab/Famba-V [ECCV 2024 Workshop Best Paper Award] Famba-V: Fast Vision Mamba with...	18	Experimental	34	Python