TsinghuaC3I/MedXpertQA
[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
This project offers a comprehensive benchmark for evaluating how well advanced AI models can understand and reason in complex medical scenarios. It provides a dataset of challenging medical exam questions, including text-based cases and multimodal examples with patient records and images. Medical researchers, AI model developers, and academic institutions can use this to rigorously test and compare the capabilities of different AI systems in a clinically relevant context.
142 stars. No commits in the last 6 months.
Use this if you need to evaluate the expert-level medical reasoning and understanding of large language models or multimodal AI systems, especially with clinically relevant and difficult questions.
Not ideal if you are looking for a simple medical question-answering dataset without the need for expert-level reasoning or multimodal input.
Stars
142
Forks
10
Language
Python
License
MIT
Category
Last pushed
Jul 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/TsinghuaC3I/MedXpertQA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Sam-Osian/PFD-toolkit
Analyse Prevention of Future Death (PFD) reports with AI
SmartFlowAI/EmoLLM
心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with...
AI-in-Health/MedLLMsPracticalGuide
[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated...
Zlasejd/HuangDI
黄帝(Huang-Di)模型仓库,基于Ziya-LLaMA-13B-V1的中医古籍知识问答大模型。
mims-harvard/Madrigal
Madrigal: Multimodal AI predicts clinical outcomes of drug combinations from preclinical data