eth-sri/matharena

Evaluation of LLMs on latest math competitions

52
/ 100
Established

This is a platform for evaluating how well different large language models (LLMs) perform on challenging math competitions and olympiads. You provide a competition (like AIME or Project Euler) and one or more LLMs, and it outputs detailed evaluation results, including whether the model's answers are correct and the reasoning steps it took. Anyone researching or developing advanced AI models for complex problem-solving can use this to benchmark their models.

229 stars.

Use this if you need to rigorously test and compare the mathematical reasoning capabilities of various LLMs on standardized competition problems.

Not ideal if you're looking for a general-purpose math solver for everyday calculations or a tool for teaching basic math concepts.

AI-model-evaluation mathematical-reasoning LLM-benchmarking AI-research competitive-math
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

229

Forks

29

Language

Python

License

MIT

Last pushed

Mar 10, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/eth-sri/matharena"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.