conceptmath/conceptmath

[ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models".

14
/ 100
Experimental

This project helps AI researchers and developers systematically evaluate the mathematical reasoning abilities of large language models (LLMs). You input a set of math problems and the LLM's responses, and it outputs a detailed breakdown of accuracy across various mathematical concepts, in both English and Chinese. This is ideal for those building or comparing LLMs for tasks requiring robust mathematical understanding.

No commits in the last 6 months.

Use this if you need to understand not just whether an LLM gets a math problem right, but which specific mathematical concepts it struggles with.

Not ideal if you're looking for a general-purpose LLM evaluation framework beyond mathematical reasoning or a tool for daily mathematical calculations.

LLM-evaluation AI-benchmarking natural-language-processing mathematical-AI model-assessment
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

24

Forks

Language

Python

License

Last pushed

May 29, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/conceptmath/conceptmath"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.