MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

/ 100

Established

This project provides a rigorous way to test and compare how well advanced AI models can understand and reason across many academic subjects. It takes college-level questions with various image types (like charts and diagrams) as input and outputs the AI's accuracy in answering them. Researchers, AI developers, and academics building or evaluating multimodal AI will find this useful.

548 stars.

Use this if you are developing advanced AI models and need a comprehensive, challenging benchmark to assess their ability to integrate visual and textual information and reason like a human expert.

Not ideal if you are looking for a simple, task-specific dataset for basic image recognition or natural language processing, as this benchmark focuses on complex, multi-disciplinary understanding.

AI-model-evaluation multimodal-AI cognitive-science expert-systems academic-assessment

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

548

Forks

Language

Python

License

Apache-2.0

Related tools

pat-jj/DeepRetrieval

[COLM’25] DeepRetrieval — 🔥 Training Search Agent by RLVR with Retrieval Outcome

lupantech/MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

x66ccff/liveideabench

[𝐍𝐚𝐭𝐮𝐫𝐞 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬] 🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea...

ise-uiuc/magicoder

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

sherryzyh/physical_reasoning_toolkit

A Python toolkit for physical reasoning in LLMs and VLMs. This toolkit streamlines access to...

Explore LLM Tools

All categories Trending LLM Tool directory Insights