eth-lre/mathtutorbench
Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025 Oral
This project provides a standardized way to test how well AI language models can act as math tutors. You input a specific math tutoring AI model, and it produces a detailed report on its performance across seven key teaching skills, such as problem-solving assistance or mistake correction. Educators, instructional designers, and AI developers building educational tools would use this to understand and improve their AI tutors.
Use this if you are developing or evaluating an AI model designed to tutor students in mathematics and need a comprehensive, automated way to assess its pedagogical effectiveness.
Not ideal if you are looking for a general-purpose AI evaluation tool or a benchmark for non-math related educational AI.
Stars
32
Forks
10
Language
Python
License
—
Category
Last pushed
Nov 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/eth-lre/mathtutorbench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)