yifanzhang-pro/AutoMathText

[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (As Huggingface Daily Papers: https://huggingface.co/papers/2402.07625)

39
/ 100
Emerging

AutoMathText is a large collection of around 200 GB of mathematical texts and code excerpts gathered from various online sources. It provides this content with an associated 'LM score' (between 0 and 1) that indicates its relevance, quality, and educational value for mathematical intelligence. This dataset is valuable for AI researchers, educators, and mathematics enthusiasts who need high-quality, pre-assessed mathematical content for learning, teaching, or training AI models.

Use this if you need a pre-scored, extensive dataset of mathematical texts and code to develop AI models for math, create educational materials, or conduct research at the intersection of mathematics and AI.

Not ideal if you require text outside of mathematics or if you prefer to manually curate and score your data.

mathematical-research AI-model-training educational-content data-curation mathematical-intelligence
No Package No Dependents
Maintenance 6 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

90

Forks

5

Language

Python

License

CC-BY-4.0

Last pushed

Nov 23, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yifanzhang-pro/AutoMathText"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.