alibaba-damo-academy/MedEvalKit

MedEvalKit: A Unified Medical Evaluation Framework

41
/ 100
Emerging

MedEvalKit helps medical researchers and AI developers rigorously test how well large AI models understand and reason in medical contexts. You input a medical AI model (like a specialized GPT or vision model) and a medical benchmark dataset (such as medical QA tests or X-ray interpretations). The output is a detailed performance report showing how accurately the AI model answers questions or interprets medical data.

212 stars.

Use this if you need to objectively compare and evaluate the performance of different large medical AI models on various medical tasks and datasets.

Not ideal if you are looking for a tool to train or fine-tune medical AI models, or if you need to develop new medical datasets.

medical AI evaluation large language model (LLM) benchmarking medical imaging interpretation clinical question answering medical natural language processing
No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 14 / 25

How are scores calculated?

Stars

212

Forks

20

Language

Python

License

Last pushed

Feb 24, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/alibaba-damo-academy/MedEvalKit"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.