open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

/ 100

Verified

This project helps AI researchers and machine learning engineers create vast, diverse datasets for training AI models to perform complex reasoning tasks. It takes requests for specific types of reasoning problems (like algebra or logic puzzles) and generates an infinite stream of unique questions and verifiable answers. The primary users are those developing and evaluating advanced AI models, especially large language models (LLMs) using reinforcement learning.

1,367 stars. Actively maintained with 7 commits in the last 30 days. Available on PyPI.

Use this if you need an endless supply of procedurally generated, algorithmically verifiable reasoning problems to train or benchmark your AI models.

Not ideal if you are looking for a pre-built, static dataset of real-world scenarios or if you are not working with advanced AI model training.

AI-model-training reinforcement-learning language-model-evaluation synthetic-data-generation reasoning-benchmarking

Maintenance 17 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 18 / 25

How are scores calculated?

Stars

1,367

Forks

114

Language

Python

License

Apache-2.0

Related tools

Hmbown/Hegelion

Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

LLM360/Reasoning360

A repo for open research on building large reasoning models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

TsinghuaC3I/Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

Peiyang-Song/Awesome-LLM-Reasoning-Failures

Repo for "Large Language Model Reasoning Failures"

Explore LLM Tools

All categories Trending LLM Tool directory Insights