MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
This project helps AI researchers and developers accurately assess the true capabilities of Large Vision-Language Models (LVLMs). It takes evaluation results from your LVLM, both with and without visual input, and produces metrics that reveal how much the visual component genuinely contributes, identifying potential overestimations. It's designed for those building and refining multimodal AI models.
204 stars. No commits in the last 6 months.
Use this if you are developing or evaluating Large Vision-Language Models and want to accurately measure the impact of visual information on their performance, ensuring the visual input is truly indispensable for the task.
Not ideal if you are evaluating models that only process text or only process images, as it specifically focuses on the interplay between vision and language.
Stars
204
Forks
5
Language
Python
License
—
Category
Last pushed
Sep 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MMStar-Benchmark/MMStar"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ExtensityAI/symbolicai
A neurosymbolic perspective on LLMs
TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...
deep-symbolic-mathematics/LLM-SR
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...
microsoft/interwhen
A framework for verifiable reasoning with language models.
zhudotexe/fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language...