yifanzhang-pro/AutoMathText
[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (As Huggingface Daily Papers: https://huggingface.co/papers/2402.07625)
AutoMathText is a large collection of around 200 GB of mathematical texts and code excerpts gathered from various online sources. It provides this content with an associated 'LM score' (between 0 and 1) that indicates its relevance, quality, and educational value for mathematical intelligence. This dataset is valuable for AI researchers, educators, and mathematics enthusiasts who need high-quality, pre-assessed mathematical content for learning, teaching, or training AI models.
Use this if you need a pre-scored, extensive dataset of mathematical texts and code to develop AI models for math, create educational materials, or conduct research at the intersection of mathematics and AI.
Not ideal if you require text outside of mathematics or if you prefer to manually curate and score your data.
Stars
90
Forks
5
Language
Python
License
CC-BY-4.0
Category
Last pushed
Nov 23, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yifanzhang-pro/AutoMathText"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ExtensityAI/symbolicai
A neurosymbolic perspective on LLMs
TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...
deep-symbolic-mathematics/LLM-SR
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...
microsoft/interwhen
A framework for verifiable reasoning with language models.
zhudotexe/fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language...