MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

52
/ 100
Established

This project provides a rigorous way to test and compare how well advanced AI models can understand and reason across many academic subjects. It takes college-level questions with various image types (like charts and diagrams) as input and outputs the AI's accuracy in answering them. Researchers, AI developers, and academics building or evaluating multimodal AI will find this useful.

548 stars.

Use this if you are developing advanced AI models and need a comprehensive, challenging benchmark to assess their ability to integrate visual and textual information and reason like a human expert.

Not ideal if you are looking for a simple, task-specific dataset for basic image recognition or natural language processing, as this benchmark focuses on complex, multi-disciplinary understanding.

AI-model-evaluation multimodal-AI cognitive-science expert-systems academic-assessment
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

548

Forks

49

Language

Python

License

Apache-2.0

Last pushed

Feb 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MMMU-Benchmark/MMMU"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.