Multimodal Visual Grounding Transformer Models
There are 9 multimodal visual grounding models tracked. The highest-rated is gabeur/mmt at 45/100 with 265 stars.
Get all 9 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-visual-grounding&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
gabeur/mmt
Multi-Modal Transformer for Video Retrieval |
|
Emerging |
| 2 |
JerryYLi/valhalla-nmt
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for... |
|
Emerging |
| 3 |
MichiganNLP/Scalable-VLM-Probing
Probe Vision-Language Models |
|
Emerging |
| 4 |
benywon/LALM
code and resource for ACL2021 paper 'Multi-Lingual Question Generation with... |
|
Experimental |
| 5 |
thunlp/cost-optimal-gqa
The code for the paper "Cost-Optimal Grouped-Query Attention for... |
|
Experimental |
| 6 |
PRITHIVSAKTHIUR/Molmo2-HF-Demo
A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model,... |
|
Experimental |
| 7 |
aimagelab/JARVIS
Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large... |
|
Experimental |
| 8 |
Skyline-9/Shotluck-Holmes
[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for... |
|
Experimental |
| 9 |
workforyou786/Large-Language-Model-Research-Paper
Multimodal AI — systems that can understand and generate information across... |
|
Experimental |