mims-harvard/CUREBench

CUREBench @ NeurIPS 2025: Benchmarking AI reasoning for therapeutic decision-making at scale

43
/ 100
Emerging

This project offers a starter kit for participants in the CURE-Bench bio-medical AI competition. It helps researchers and AI practitioners evaluate how well their AI models perform on complex therapeutic decision-making tasks. You provide medical case data in JSONL format and your AI model's configurations, and it generates a structured CSV submission file with your model's predictions and reasoning for evaluation.

129 stars.

Use this if you are participating in the CURE-Bench competition and need a straightforward way to generate and submit your AI model's predictions for therapeutic reasoning tasks.

Not ideal if you are looking for a general-purpose AI model evaluation framework outside the specific context and data format of the CURE-Bench competition.

AI-in-medicine therapeutic-decision-making biomedical-AI AI-competition medical-reasoning
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 20 / 25

How are scores calculated?

Stars

129

Forks

31

Language

Python

License

Last pushed

Dec 06, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mims-harvard/CUREBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.