NVlabs/RLP
[ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a Pretraining Objective
This project helps AI researchers and developers create large language models (LLMs) that can "think" more effectively before generating answers. By integrating a reinforcement learning objective during the model's initial training, it teaches the model to generate intermediate reasoning steps. This results in LLMs that produce more accurate and robust outputs for complex tasks, especially in areas like math and science.
241 stars.
Use this if you are pre-training large language models and want to instill strong reasoning capabilities and improved accuracy from the very beginning, without significantly increasing computational cost.
Not ideal if you are looking for a tool to fine-tune an already pre-trained model or if your primary goal is to optimize for speed over complex reasoning.
Stars
241
Forks
16
Language
—
License
—
Category
Last pushed
Jan 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NVlabs/RLP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.