GAIR-NLP/OctoThinker

Revisiting Mid-training in the Era of Reinforcement Learning Scaling

/ 100

Emerging

This project offers a comprehensive pipeline for researchers and developers working with Large Language Models (LLMs) to explore how different early pre-training strategies impact subsequent reinforcement learning (RL) stages. It takes base LLMs and various mid-training strategies as input, producing models with enhanced reasoning abilities and self-reflection, alongside detailed evaluation results. The target audience is AI researchers and engineers focused on advancing LLM capabilities through innovative training methods.

185 stars. No commits in the last 6 months.

Use this if you are an AI researcher or engineer who needs to experiment with how early-stage LLM training influences their performance during reinforcement learning, especially for models requiring strong reasoning and self-reflection.

Not ideal if you are a practitioner looking for a ready-to-use LLM for general applications without needing to delve into the specifics of pre-training and RL research.

AI-research large-language-models reinforcement-learning model-pre-training natural-language-processing

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 12 / 25

How are scores calculated?

Stars

185

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

unslothai/unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama,...

huggingface/peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

modelscope/ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5,...

oumi-ai/oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

linkedin/Liger-Kernel

Efficient Triton Kernels for LLM Training

Explore Transformer Models

All categories Trending Transformer directory Insights