sylvain-wei/24-Game-Reasoning

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1

/ 100

Experimental

This project helps AI researchers and developers improve how large language models (LLMs) solve mathematical problems. It takes an existing LLM and trains it using different techniques with the classic 24 Game. The output is a more accurate and robust LLM capable of better mathematical reasoning and self-correction, which can then be applied to other complex logical tasks.

No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models and need to enhance their mathematical reasoning and self-verification abilities.

Not ideal if you are a casual user looking for a ready-to-play 24 Game solver or a general-purpose AI, as this is a research-focused development tool.

LLM-fine-tuning AI-research mathematical-reasoning reinforcement-learning model-evaluation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Hmbown/Hegelion

Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

LLM360/Reasoning360

A repo for open research on building large reasoning models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

TsinghuaC3I/Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights