GAIR-NLP/OctoThinker

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

39
/ 100
Emerging

This project offers a comprehensive pipeline for researchers and developers working with Large Language Models (LLMs) to explore how different early pre-training strategies impact subsequent reinforcement learning (RL) stages. It takes base LLMs and various mid-training strategies as input, producing models with enhanced reasoning abilities and self-reflection, alongside detailed evaluation results. The target audience is AI researchers and engineers focused on advancing LLM capabilities through innovative training methods.

185 stars. No commits in the last 6 months.

Use this if you are an AI researcher or engineer who needs to experiment with how early-stage LLM training influences their performance during reinforcement learning, especially for models requiring strong reasoning and self-reflection.

Not ideal if you are a practitioner looking for a ready-to-use LLM for general applications without needing to delve into the specifics of pre-training and RL research.

AI-research large-language-models reinforcement-learning model-pre-training natural-language-processing
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 12 / 25

How are scores calculated?

Stars

185

Forks

14

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Jul 23, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/GAIR-NLP/OctoThinker"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.