Multimodal Vision Language Computer Vision Tools

There are 22 multimodal vision language tools tracked. The highest-rated is col14m/cadrille at 49/100 with 110 stars.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=computer-vision&subcategory=multimodal-vision-language&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 col14m/cadrille

[ICLR2026] cadrille: Multi-modal CAD Reconstruction with Online...

49
Emerging
2 filaPro/cad-recode

[ICCV2025] CAD-Recode: Reverse Engineering CAD Code from Point Clouds

48
Emerging
3 pengsongyou/openscene

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

43
Emerging
4 worldbench/3EED

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

42
Emerging
5 cambrian-mllm/cambrian-s

Cambrian-S: Towards Spatial Supersensing in Video

42
Emerging
6 Gorilla-Lab-SCUT/PaDT

[ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards...

42
Emerging
7 InternLM/Spatial-SSRL

[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial...

40
Emerging
8 Davidyao99/uni4d

[CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a...

39
Emerging
9 TimeBlindness/time-blindness

[CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can?

38
Emerging
10 bagh2178/UniGoal

[CVPR 2025] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

38
Emerging
11 Haochen-Wang409/TreeVGR

[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation...

38
Emerging
12 ajzhai/NeRF2Physics

[CVPR 2024] Physical Property Understanding from Language-Embedded Feature Fields

38
Emerging
13 IDEA-Research/RexSeek

[ICCV2025] Referring any person or objects given a natural language...

38
Emerging
14 Sid2697/HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction...

32
Emerging
15 taco-group/SparkVSR

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

29
Experimental
16 Haochen-Wang409/ross3d

[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

29
Experimental
17 Jiaxuan-Li/EVCap

[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name...

28
Experimental
18 fz-zsl/QuatRoPE

The official implementation for CVPR 2026 paper Scalable Object Relation...

24
Experimental
19 Hon-Wong/Elysium

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

24
Experimental
20 BaohaoLiao/road

[NeurIPS 2024] 3-in-1: 2D Rotary Adaptation for Efficient Finetuning,...

24
Experimental
21 sled-group/3D-GRAND

[CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs

21
Experimental
22 AIoT-MLSys-Lab/Famba-V

[ECCV 2024 Workshop Best Paper Award] Famba-V: Fast Vision Mamba with...

18
Experimental