GAIR-NLP/MegaScience
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
MegaScience helps scientists and researchers build or improve AI models designed for complex scientific reasoning. It provides high-quality datasets of millions of science reasoning questions and answers, extracted from university textbooks across seven disciplines. The output is an AI model that can understand and answer intricate scientific problems, making it easier to develop AI scientists or research assistants.
113 stars.
Use this if you are a researcher or institution looking to develop or enhance AI models that can accurately perform scientific reasoning tasks across various disciplines.
Not ideal if you need a pre-built, off-the-shelf AI model for general knowledge or non-scientific tasks.
Stars
113
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/GAIR-NLP/MegaScience"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
unslothai/unsloth
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama,...
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5,...
oumi-ai/oumi
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training