alibaba-damo-academy/MedEvalKit
MedEvalKit: A Unified Medical Evaluation Framework
MedEvalKit helps medical researchers and AI developers rigorously test how well large AI models understand and reason in medical contexts. You input a medical AI model (like a specialized GPT or vision model) and a medical benchmark dataset (such as medical QA tests or X-ray interpretations). The output is a detailed performance report showing how accurately the AI model answers questions or interprets medical data.
212 stars.
Use this if you need to objectively compare and evaluate the performance of different large medical AI models on various medical tasks and datasets.
Not ideal if you are looking for a tool to train or fine-tune medical AI models, or if you need to develop new medical datasets.
Stars
212
Forks
20
Language
Python
License
—
Category
Last pushed
Feb 24, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/alibaba-damo-academy/MedEvalKit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
microsoft/eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score...
mims-harvard/SPECTRA
SPECTRA: Spectral framework for evaluation of biomedical AI models
AntGamerMD21/eval-guide
📊 Explore ML evaluation metrics through interactive notebooks with pre-run outputs for hands-on...