alibaba-damo-academy/MedEvalKit

MedEvalKit: A Unified Medical Evaluation Framework

/ 100

Emerging

MedEvalKit helps medical researchers and AI developers rigorously test how well large AI models understand and reason in medical contexts. You input a medical AI model (like a specialized GPT or vision model) and a medical benchmark dataset (such as medical QA tests or X-ray interpretations). The output is a detailed performance report showing how accurately the AI model answers questions or interprets medical data.

212 stars.

Use this if you need to objectively compare and evaluate the performance of different large medical AI models on various medical tasks and datasets.

Not ideal if you are looking for a tool to train or fine-tune medical AI models, or if you need to develop new medical datasets.

medical AI evaluation large language model (LLM) benchmarking medical imaging interpretation clinical question answering medical natural language processing

No License No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 7 / 25

Community 14 / 25

How are scores calculated?

Stars

212

Forks

Language

Python

License

—

Higher-rated alternatives

microsoft/eureka-ml-insights

A framework for standardizing evaluations of large foundation models, beyond single-score...

mims-harvard/SPECTRA

SPECTRA: Spectral framework for evaluation of biomedical AI models

AntGamerMD21/eval-guide

📊 Explore ML evaluation metrics through interactive notebooks with pre-run outputs for hands-on...

Explore ML Frameworks

All categories Trending ML Framework directory Insights