TsinghuaC3I/MedXpertQA

[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

/ 100

Emerging

This project offers a comprehensive benchmark for evaluating how well advanced AI models can understand and reason in complex medical scenarios. It provides a dataset of challenging medical exam questions, including text-based cases and multimodal examples with patient records and images. Medical researchers, AI model developers, and academic institutions can use this to rigorously test and compare the capabilities of different AI systems in a clinically relevant context.

142 stars. No commits in the last 6 months.

Use this if you need to evaluate the expert-level medical reasoning and understanding of large language models or multimodal AI systems, especially with clinically relevant and difficult questions.

Not ideal if you are looking for a simple medical question-answering dataset without the need for expert-level reasoning or multimodal input.

medical-evaluation clinical-reasoning medical-AI medical-education healthcare-research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

142

Forks

Language

Python

License

MIT

Higher-rated alternatives

Sam-Osian/PFD-toolkit

Analyse Prevention of Future Death (PFD) reports with AI

SmartFlowAI/EmoLLM

心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with...

AI-in-Health/MedLLMsPracticalGuide

[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated...

Zlasejd/HuangDI

黄帝(Huang-Di)模型仓库，基于Ziya-LLaMA-13B-V1的中医古籍知识问答大模型。

mims-harvard/Madrigal

Madrigal: Multimodal AI predicts clinical outcomes of drug combinations from preclinical data

Explore LLM Tools

All categories Trending LLM Tool directory Insights