wshi83/MedAgentGym
[ICLR'26] MedAgentGYM: Training LLM Agents for Code-Based Medical Reasoning at Scale
This training environment is designed to improve how large language models (LLMs) can reason and generate code for medical tasks. It takes anonymized electronic health record (EHR) data and medical task descriptions, then evaluates the LLM's ability to produce correct, executable code for medical reasoning problems. Researchers and AI developers focused on medical AI would use this to build more capable AI assistants for healthcare.
Use this if you are a researcher or AI developer working on training or fine-tuning large language models to perform complex, code-based medical reasoning tasks.
Not ideal if you are a clinician looking for a ready-to-use medical diagnostic tool, as this is an environment for developing and evaluating AI, not a clinical application.
Stars
84
Forks
5
Language
Python
License
—
Category
Last pushed
Feb 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wshi83/MedAgentGym"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ai4co/reevo
[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution
SALT-NLP/collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together...
Gen-Verse/LatentMAS
Latent Collaboration in Multi-Agent Systems
lean-dojo/LeanCopilot
LLMs as Copilots for Theorem Proving in Lean
WooooDyy/AgentGym-RL
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon...