naivoder/MCTSr

Monte Carlo Tree Search Self-Refine (MCTSr)

/ 100

Experimental

This project helps AI researchers and developers evaluate the problem-solving abilities of large language models (LLMs). By feeding mathematical word problems or complex math equations into a local LLaMA instance, it systematically tests how well the model generates correct answers and refines its reasoning. The output provides insights into the LLM's performance on these challenging datasets.

No commits in the last 6 months.

Use this if you are an AI researcher or developer focused on understanding and improving the mathematical reasoning capabilities of LLMs.

Not ideal if you are looking for a fully polished, production-ready tool for general LLM evaluation without deep technical engagement.

AI-research LLM-evaluation mathematical-reasoning natural-language-processing model-benchmarking

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Hmbown/Hegelion

Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

LLM360/Reasoning360

A repo for open research on building large reasoning models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

TsinghuaC3I/Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights