Multimodal Visual Grounding Transformer Models

There are 9 multimodal visual grounding models tracked. The highest-rated is gabeur/mmt at 45/100 with 265 stars.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=multimodal-visual-grounding&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	gabeur/mmt Multi-Modal Transformer for Video Retrieval	45	Emerging	265	Python
2	JerryYLi/valhalla-nmt Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for...	35	Emerging	28	Python
3	MichiganNLP/Scalable-VLM-Probing Probe Vision-Language Models	32	Emerging	5	Python
4	benywon/LALM code and resource for ACL2021 paper 'Multi-Lingual Question Generation with...	26	Experimental	5	Python
5	thunlp/cost-optimal-gqa The code for the paper "Cost-Optimal Grouped-Query Attention for...	25	Experimental	4	Python
6	PRITHIVSAKTHIUR/Molmo2-HF-Demo A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model,...	22	Experimental	4	Python
7	aimagelab/JARVIS Seeing Beyond Words: Self-Supervised Visual Learning for Multimodal Large...	22	Experimental	6	Python
8	Skyline-9/Shotluck-Holmes [ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for...	21	Experimental	13	Python
9	workforyou786/Large-Language-Model-Research-Paper Multimodal AI — systems that can understand and generate information across...	19	Experimental	—	—