TsinghuaC3I/MedXpertQA

[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

39
/ 100
Emerging

This project offers a comprehensive benchmark for evaluating how well advanced AI models can understand and reason in complex medical scenarios. It provides a dataset of challenging medical exam questions, including text-based cases and multimodal examples with patient records and images. Medical researchers, AI model developers, and academic institutions can use this to rigorously test and compare the capabilities of different AI systems in a clinically relevant context.

142 stars. No commits in the last 6 months.

Use this if you need to evaluate the expert-level medical reasoning and understanding of large language models or multimodal AI systems, especially with clinically relevant and difficult questions.

Not ideal if you are looking for a simple medical question-answering dataset without the need for expert-level reasoning or multimodal input.

medical-evaluation clinical-reasoning medical-AI medical-education healthcare-research
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

142

Forks

10

Language

Python

License

MIT

Last pushed

Jul 17, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/TsinghuaC3I/MedXpertQA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.