Multimodal Visual Grounding Transformer Models

There are 9 multimodal visual grounding models tracked. The highest-rated is gabeur/mmt at 45/100 with 265 stars.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-visual-grounding&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 gabeur/mmt

Multi-Modal Transformer for Video Retrieval

45
Emerging
2 JerryYLi/valhalla-nmt

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for...

35
Emerging
3 MichiganNLP/Scalable-VLM-Probing

Probe Vision-Language Models

32
Emerging
4 benywon/LALM

code and resource for ACL2021 paper 'Multi-Lingual Question Generation with...

26
Experimental
5 thunlp/cost-optimal-gqa

The code for the paper "Cost-Optimal Grouped-Query Attention for...

25
Experimental
6 PRITHIVSAKTHIUR/Molmo2-HF-Demo

A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model,...

22
Experimental
7 aimagelab/JARVIS

Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large...

22
Experimental
8 Skyline-9/Shotluck-Holmes

[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for...

21
Experimental
9 workforyou786/Large-Language-Model-Research-Paper

Multimodal AI — systems that can understand and generate information across...

19
Experimental