iiis-ai/TemplateMath
[ICLR 2025 DATA-FM] Training and Evaluating Language Models with Template-based Data Generation (https://arxiv.org/abs/2411.18104)
This project helps AI researchers and machine learning engineers develop and evaluate large language models (LLMs) that excel at mathematical reasoning. It provides a massive dataset of over 7.4 million synthetically generated grade-school math problems, each with a natural language explanation and a programmatically verified code solution. Researchers can use this high-quality data to train more capable and reliable AI models.
Use this if you are an AI researcher or machine learning engineer looking for a high-quality, large-scale dataset to train or fine-tune language models for complex mathematical reasoning tasks.
Not ideal if you are looking for real-world, human-generated math problems or if your primary focus is on non-mathematical language tasks.
Stars
13
Forks
1
Language
Python
License
—
Category
Last pushed
Nov 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/iiis-ai/TemplateMath"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
VikParuchuri/textbook_quality
Generate textbook-quality synthetic LLM pretraining data
dmanuel64/codablellm
A framework for creating and curating high-quality code datasets tailored for large language models
BhabhaAI/dataformer
Solving data for LLMs - Create quality synthetic datasets!
BothBosu/Synthetic-Data-for-Scam-Detection-Leveraging-LLMs-to-Train-Deep-Learning-Models
This repository contains the source code and synthetic datasets used in the research on scam...
MichiganNLP/depression_synthetic_data
Can LMs generate useful synthetic data for the mental health domain?