GAIR-NLP/OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
This project offers a comprehensive pipeline for researchers and developers working with Large Language Models (LLMs) to explore how different early pre-training strategies impact subsequent reinforcement learning (RL) stages. It takes base LLMs and various mid-training strategies as input, producing models with enhanced reasoning abilities and self-reflection, alongside detailed evaluation results. The target audience is AI researchers and engineers focused on advancing LLM capabilities through innovative training methods.
185 stars. No commits in the last 6 months.
Use this if you are an AI researcher or engineer who needs to experiment with how early-stage LLM training influences their performance during reinforcement learning, especially for models requiring strong reasoning and self-reflection.
Not ideal if you are a practitioner looking for a ready-to-use LLM for general applications without needing to delve into the specifics of pre-training and RL research.
Stars
185
Forks
14
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Jul 23, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/GAIR-NLP/OctoThinker"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
unslothai/unsloth
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama,...
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5,...
oumi-ai/oumi
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training