MMStar-Benchmark/MMStar

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

24
/ 100
Experimental

This project helps AI researchers and developers accurately assess the true capabilities of Large Vision-Language Models (LVLMs). It takes evaluation results from your LVLM, both with and without visual input, and produces metrics that reveal how much the visual component genuinely contributes, identifying potential overestimations. It's designed for those building and refining multimodal AI models.

204 stars. No commits in the last 6 months.

Use this if you are developing or evaluating Large Vision-Language Models and want to accurately measure the impact of visual information on their performance, ensuring the visual input is truly indispensable for the task.

Not ideal if you are evaluating models that only process text or only process images, as it specifically focuses on the interplay between vision and language.

AI-model-evaluation multimodal-AI vision-language-models AI-benchmarking model-performance-analysis
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 6 / 25

How are scores calculated?

Stars

204

Forks

5

Language

Python

License

Last pushed

Sep 26, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MMStar-Benchmark/MMStar"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.