RaptorMai/MLLM-CompBench

[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes

31
/ 100
Emerging

This project provides a ready-made dataset and questions to test how well AI models can compare two images. It takes in pairs of images from diverse categories like fashion, animals, and scenes, along with questions asking for comparisons related to visual traits, states, emotions, or quantities. The output helps evaluate the model's ability to identify subtle differences or similarities, which is useful for AI researchers and developers working on visual intelligence.

No commits in the last 6 months.

Use this if you are a researcher or AI developer who needs to rigorously benchmark the comparative reasoning capabilities of your multimodal AI models using a large, human-annotated dataset.

Not ideal if you are looking for a tool to train models from scratch or to perform image comparisons directly without a focus on AI model evaluation.

AI model evaluation computer vision research multimodal AI image analysis benchmark datasets
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

44

Forks

2

Language

Jupyter Notebook

License

Last pushed

Apr 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/RaptorMai/MLLM-CompBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.