Math Reasoning Datasets Transformer Models

There are 44 math reasoning datasets models tracked. 1 score above 70 (verified tier). The highest-rated is ExtensityAI/symbolicai at 71/100 with 1,677 stars. 1 of the top 10 are actively maintained.

Get all 44 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=math-reasoning-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 ExtensityAI/symbolicai

A neurosymbolic perspective on LLMs

71
Verified
2 TIGER-AI-Lab/MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task...

56
Established
3 deep-symbolic-mathematics/LLM-SR

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on...

49
Emerging
4 microsoft/interwhen

A framework for verifiable reasoning with language models.

44
Emerging
5 zhudotexe/fanoutqa

Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering...

44
Emerging
6 xlang-ai/Binder

[ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages"

43
Emerging
7 HiThink-Research/MME-Finance

[MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

40
Emerging
8 yifanzhang-pro/AutoMathText

[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative...

39
Emerging
9 DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries

[ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across...

37
Emerging
10 AlphaPav/mem-kk-logic

On Memorization of Large Language Models in Logical Reasoning

37
Emerging
11 TIGER-AI-Lab/StructLM

Code and data for "StructLM: Towards Building Generalist Models for...

37
Emerging
12 princeton-pli/AdaptMI

[COLM 2025] Adaptive Skill-based In-context Math Instruction for Small...

37
Emerging
13 TIGER-AI-Lab/LongICLBench

Code and Data for "Long-context LLMs Struggle with Long In-context Learning"...

36
Emerging
14 declare-lab/LLM-PuzzleTest

This repository is maintained to release dataset and models for multimodal...

36
Emerging
15 TIGER-AI-Lab/MAmmoTH

Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid...

36
Emerging
16 SeekingDream/DyCodeEval

Official repository of the ICML2025 paper “Dynamic Benchmarking of Reasoning...

36
Emerging
17 amazon-science/recode

Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"

35
Emerging
18 akjindal53244/Arithmo

Small and Efficient Mathematical Reasoning LLMs

35
Emerging
19 google/curie

Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long...

34
Emerging
20 martin-wey/CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)

33
Emerging
21 ryokamoi/llm-self-correction-papers

List of papers on Self-Correction of LLMs.

31
Emerging
22 surrey-nlp/LLM4MT_eval

This repository is for our paper "What do large language model need for...

31
Emerging
23 QwenLM/PolyMath

[NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath:...

31
Emerging
24 conditionWang/FLNK

Federated Learning with New Knowledge -- explore to incorporate various new...

30
Emerging
25 reasoning-machines/CoCoGen

Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)

30
Emerging
26 bobxwu/learning-from-rewards-llm-papers

A comrephensive collection of learning from rewards in the post-training and...

30
Emerging
27 neuro-symbolic-ai/explanation_based_ethical_reasoning

Code and data for Paper "Enhancing Ethical Explanations of Large Language...

30
Emerging
28 gersteinlab/Struc-Bench

[NAACL 2024] Struc-Bench: Are Large Language Models Good at Generating...

30
Emerging
29 zjunlp/DynamicKnowledgeCircuits

[ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits...

29
Experimental
30 kaistAI/LangBridge

[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision

28
Experimental
31 WooooDyy/MathCritique

Implementation for the research paper "Enhancing LLM Reasoning via Critique...

27
Experimental
32 merlerm/In-Context-Symbolic-Regression

Official code implementation for the ACL 2024 Student Research Workshop...

27
Experimental
33 YangLing0818/SuperCorrect-llm

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought...

27
Experimental
34 joeljang/continual-knowledge-learning

[ICLR 2022] Towards Continual Knowledge Learning of Language Models

27
Experimental
35 UCSC-VLAA/vllm-safety-benchmark

[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in...

25
Experimental
36 MMStar-Benchmark/MMStar

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on...

24
Experimental
37 TIGER-AI-Lab/TableCoT

The code and data for paper "Large Language Models are few(1)-shot Table...

23
Experimental
38 iiis-ai/IterativeQuestionComposing

[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing...

23
Experimental
39 Eleanor-H/MUSTARD

Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform...

21
Experimental
40 yahskapar/LLMs-and-Probabilistic-Reasoning

Data and software artifacts for the EMNLP 2024 (Main) paper "What Are the...

20
Experimental
41 yashmahe2020/math-tutor-research

Research on Large Language Model capabilities in mathematics tutoring and...

19
Experimental
42 Liz-Atlas/last_frame_whitepaper

A Modular Knowledge Transfer System for Large Language Models

17
Experimental
43 kreasof-ai/self-perturbation-learning

Imagine "2 truth and a lie", but formalized as ML training objective

17
Experimental
44 Shengyu-Feng/TSMC4MATH

[ICLR2025] Step-by-Step Reasoning for Math Problems via Twisted Sequential...

13
Experimental