MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
This project provides a rigorous way to test and compare how well advanced AI models can understand and reason across many academic subjects. It takes college-level questions with various image types (like charts and diagrams) as input and outputs the AI's accuracy in answering them. Researchers, AI developers, and academics building or evaluating multimodal AI will find this useful.
548 stars.
Use this if you are developing advanced AI models and need a comprehensive, challenging benchmark to assess their ability to integrate visual and textual information and reason like a human expert.
Not ideal if you are looking for a simple, task-specific dataset for basic image recognition or natural language processing, as this benchmark focuses on complex, multi-disciplinary understanding.
Stars
548
Forks
49
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MMMU-Benchmark/MMMU"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
pat-jj/DeepRetrieval
[COLM’25] DeepRetrieval — 🔥 Training Search Agent by RLVR with Retrieval Outcome
lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
x66ccff/liveideabench
[𝐍𝐚𝐭𝐮𝐫𝐞 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬] 🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea...
ise-uiuc/magicoder
[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
sherryzyh/physical_reasoning_toolkit
A Python toolkit for physical reasoning in LLMs and VLMs. This toolkit streamlines access to...