RLHF Alignment Training Transformer Models
Tools and frameworks for training language models using reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and related alignment techniques. Includes implementations of RLHF pipelines, preference learning methods, and safety-focused training approaches. Does NOT include general safety evaluation, jailbreak detection, or post-hoc alignment analysis without training components.
There are 123 rlhf alignment training models tracked. 9 score above 50 (established tier). The highest-rated is agentscope-ai/Trinity-RFT at 69/100 with 557 stars. 3 of the top 10 are actively maintained.
Get all 123 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=rlhf-alignment-training&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed... |
|
Established |
| 2 |
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on... |
|
Established |
| 3 |
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs. |
|
Established |
| 4 |
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences |
|
Established |
| 5 |
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work. |
|
Established |
| 6 |
PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback |
|
Established |
| 7 |
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from... |
|
Established |
| 8 |
opendilab/LightRFT
LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement... |
|
Established |
| 9 |
Gen-Verse/dLLM-RL
[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for... |
|
Established |
| 10 |
hscspring/hcgf
Humanable Chat Generative-model Fine-tuning | LLM微调 |
|
Emerging |
| 11 |
conceptofmind/LaMDA-rlhf-pytorch
Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding... |
|
Emerging |
| 12 |
sinanuozdemir/oreilly-llm-rl-alignment
This training offers an intensive exploration into the frontier of... |
|
Emerging |
| 13 |
hiyouga/ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调 |
|
Emerging |
| 14 |
NVlabs/RLP
[ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a... |
|
Emerging |
| 15 |
RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF. |
|
Emerging |
| 16 |
hiyouga/FastEdit
🩹Editing large language models within 10 seconds⚡ |
|
Emerging |
| 17 |
OPTML-Group/Unlearn-Simple
[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative... |
|
Emerging |
| 18 |
uclaml/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN) |
|
Emerging |
| 19 |
xyjigsaw/LLM-Pretrain-SFT
Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed) |
|
Emerging |
| 20 |
tatsu-lab/alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method... |
|
Emerging |
| 21 |
ZinYY/Online_RLHF
A PyTorch implementation of the paper "Provably Efficient Online RLHF with... |
|
Emerging |
| 22 |
nickduran/align2-linguistic-alignment
ALIGN 2.0: Modern Python package for multi-level linguistic alignment... |
|
Emerging |
| 23 |
pratyushasharma/laser
The Truth Is In There: Improving Reasoning in Language Models with... |
|
Emerging |
| 24 |
l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback)... |
|
Emerging |
| 25 |
WayneJin0918/SRUM
Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified... |
|
Emerging |
| 26 |
NVlabs/Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025) |
|
Emerging |
| 27 |
WangJingyao07/Awesome-GRPO
Codebase of GRPO: Implementations and Resources of GRPO and Its Variants |
|
Emerging |
| 28 |
complex-reasoning/RPG
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508) |
|
Emerging |
| 29 |
nicola-decao/KnowledgeEditor
Code for Editing Factual Knowledge in Language Models |
|
Emerging |
| 30 |
jackaduma/Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer... |
|
Emerging |
| 31 |
openpsi-project/ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation |
|
Emerging |
| 32 |
daniel-furman/sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and... |
|
Emerging |
| 33 |
rosinality/halite
Acceleration framework for Human Alignment Learning |
|
Emerging |
| 34 |
tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences |
|
Emerging |
| 35 |
RishabSA/interp-refusal-tokens
We study whether categorical refusal tokens enable controllable and... |
|
Emerging |
| 36 |
HKUNLP/icl-ceil
[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”. |
|
Emerging |
| 37 |
zjunlp/Mol-Instructions
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset... |
|
Emerging |
| 38 |
jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer... |
|
Emerging |
| 39 |
abenechehab/dicl
[ICLR 2025] Official implementation of DICL (Disentangled In-Context... |
|
Emerging |
| 40 |
AIFrameResearch/SPO
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL... |
|
Emerging |
| 41 |
kaistAI/Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting... |
|
Emerging |
| 42 |
tlc4418/llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles. |
|
Emerging |
| 43 |
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models |
|
Emerging |
| 44 |
jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer... |
|
Emerging |
| 45 |
NVlabs/NFT
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging... |
|
Emerging |
| 46 |
qizhou000/UniEdit
[NeurIPS 2025 B & D] UniEdit: A Unified Knowledge Editing Benchmark for... |
|
Emerging |
| 47 |
GithubX-F/DynaMO-RL
Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization... |
|
Emerging |
| 48 |
CLAIRE-Labo/quantile-reward-policy-optimization
Official codebase for "Quantile Reward Policy Optimization: Alignment with... |
|
Emerging |
| 49 |
ZJLAB-AMMI/LLM4Teach
Python code to implement LLM4Teach, a policy distillation approach for... |
|
Emerging |
| 50 |
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO. |
|
Emerging |
| 51 |
PKU-Alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on... |
|
Emerging |
| 52 |
holarissun/RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models... |
|
Emerging |
| 53 |
LunjunZhang/ema-pg
Code for "EMA Policy Gradient: Taming Reinforcement Learning for LLMs with... |
|
Emerging |
| 54 |
yaojin17/Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large... |
|
Emerging |
| 55 |
WooooDyy/BAPO
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for... |
|
Emerging |
| 56 |
YJiangcm/LTE
[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing |
|
Emerging |
| 57 |
CJReinforce/PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is... |
|
Emerging |
| 58 |
liziniu/policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data) |
|
Emerging |
| 59 |
nlp-uoregon/Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with... |
|
Emerging |
| 60 |
NiuTrans/Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for... |
|
Emerging |
| 61 |
seonghyeonye/Flipped-Learning
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models... |
|
Emerging |
| 62 |
twitter-research/multilingual-alignment-tpp
Code for reproducing the paper Improved Multilingual Language Model... |
|
Emerging |
| 63 |
ksm26/Reinforcement-Learning-from-Human-Feedback
Embark on the "Reinforcement Learning from Human Feedback" course and align... |
|
Emerging |
| 64 |
astorfi/LLM-Alignment-Project
A comprehensive template for aligning large language models (LLMs) using... |
|
Experimental |
| 65 |
liziniu/ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement... |
|
Experimental |
| 66 |
InternLM/Spark
An official implementation of "SPARK: Synergistic Policy And Reward... |
|
Experimental |
| 67 |
mintaywon/IF_RLHF
Source code for 'Understanding impacts of human feedback via influence functions' |
|
Experimental |
| 68 |
YukinoshitaKaren/Reason-KE
[EMNLP 2025 Findings] Robust Knowledge Editing via Explicit Reasoning Chains... |
|
Experimental |
| 69 |
Yellow4Submarine7/LLMDoctor
🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time... |
|
Experimental |
| 70 |
aerosta/rewardhackwatch
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1... |
|
Experimental |
| 71 |
li-plus/nanoRLHF
Train a tiny LLaMA model from scratch to repeat your words using... |
|
Experimental |
| 72 |
gao-g/prelude
Code for the paper "Aligning LLM Agents by Learning Latent Preference from... |
|
Experimental |
| 73 |
haozheji/exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact... |
|
Experimental |
| 74 |
pangatlo/RL-100
🤖 Implement advanced robotic manipulation techniques using real-world... |
|
Experimental |
| 75 |
wangclnlp/DeepSpeed-Chat-Extension
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF). |
|
Experimental |
| 76 |
Manohara-Ai/Reinforcement_Learning_Framework_to_Prevent_Jailbreaks
A reinforcement learning-based system designed to detect and prevent... |
|
Experimental |
| 77 |
RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from... |
|
Experimental |
| 78 |
thinkwee/NOVER
[EMNLP-2025] R1-Zero on ANY TASK |
|
Experimental |
| 79 |
RUCKBReasoning/CodeRM
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of... |
|
Experimental |
| 80 |
rafaelvp-db/hf-finetune
Fine tuning a GPT model using the Persuasion for Good dataset. |
|
Experimental |
| 81 |
5663015/LLMs_train
一套代码指令微调大模型 |
|
Experimental |
| 82 |
yihedeng9/rlhf-summary-notes
A brief and partial summary of RLHF algorithms. |
|
Experimental |
| 83 |
ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model |
|
Experimental |
| 84 |
SharathHebbar/sft_mathgpt2
Supervised Fine tuning using TRL library |
|
Experimental |
| 85 |
bhimanbaghel/ResolveUnderOverEdit
Official implementation of "Resolving UnderEdit & OverEdit with Iterative &... |
|
Experimental |
| 86 |
clam004/minichatgpt
annotated tutorial of the huggingface TRL repo for reinforcement learning... |
|
Experimental |
| 87 |
VoxDroid/llm-wikipedia
A project for fine-tuning large language models (LLMs) on curated Wikipedia... |
|
Experimental |
| 88 |
pleiadian53/llm-lab
A research sandbox for LLM pretraining, fine-tuning (SFT, DPO, RLHF), and... |
|
Experimental |
| 89 |
sailik1991/deal
Decoding Time Alignment Search |
|
Experimental |
| 90 |
herbitovich/ai-alignment
Implementing the REINFORCE algorithm in the process of RLHF for LM alignment. |
|
Experimental |
| 91 |
PKU-Alignment/llms-resist-alignment
[ACL2025 Best Paper] Language Models Resist Alignment |
|
Experimental |
| 92 |
kylebrussell/cap-rlvr
CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning... |
|
Experimental |
| 93 |
313mystery303/vla0-trl
🔍 Explore a minimal reimplementation of VLA-0 with TRL, achieving 90% LIBERO... |
|
Experimental |
| 94 |
Dylsimple60/RLHF_learn
🤖 Enhance reinforcement learning stability and efficiency with advanced... |
|
Experimental |
| 95 |
ducnh279/Align-LLMs-with-DPO
Align a Large Language Model (LLM) with DPO loss |
|
Experimental |
| 96 |
Martin-qyma/TRM
From Faithfulness to Correctness: Generative Reward Models that Think Critically |
|
Experimental |
| 97 |
balnarendrasapa/faq-llm
This is course project for DSCI 6004 deals with fine-tuning a pretrained... |
|
Experimental |
| 98 |
sathishkumar67/GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO
Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a... |
|
Experimental |
| 99 |
rxian/domain-alignment
Code for importance-weighted domain alignment, and the paper “Cross-Lingual... |
|
Experimental |
| 100 |
Daddy-Myth/Fine-tuning-Flan-T5-RLHF
Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for... |
|
Experimental |
| 101 |
ma-spie/LLM_metaphor_detection
Repository for the paper "Literary Metaphor Detection with LLM Fine-Tuning... |
|
Experimental |
| 102 |
closestfriend/efficient-domain-adaptation
Research repository for Brie: LLM-assisted data authoring methodology... |
|
Experimental |
| 103 |
DolbyUUU/DeepEnlighten
Pure RL to post-train base models for social reasoning capabilities.... |
|
Experimental |
| 104 |
SafeRL-Lab/TeaMs-RL
[TMLR] TeaMs-RL: Teaching LLMs to Generate Better Instruction Datasets via... |
|
Experimental |
| 105 |
Yousifus/rlhf_loop_humain
RLHF Loop System - Learning project with monitoring dashboard, drift... |
|
Experimental |
| 106 |
fake-it0628/jailbreak-defense
Jailbreak Defense System based on Hidden State Causal Monitoring for LLMs |
|
Experimental |
| 107 |
liziniu/cold_start_rl
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs? |
|
Experimental |
| 108 |
kantkrishan0206-crypto/AlignGPT
“This project implements a mini LLM alignment pipeline using Reinforcement... |
|
Experimental |
| 109 |
DanielSc4/RewardLM
Reward a Language Model with pancakes 🥞 |
|
Experimental |
| 110 |
pradeepiyer/nothing-gpt
SFT + DPO fine tuned model about Nothing. |
|
Experimental |
| 111 |
Jason-Wang313/Drift-Bench
Quantifying the "Safety Half-Life" of LLMs: A framework to measure how... |
|
Experimental |
| 112 |
fabiantoh98/llm-preference-learning
End-to-end LLM preference learning pipeline: training, evaluation, and... |
|
Experimental |
| 113 |
cluebbers/dpo-rlhf-paraphrase-types
Enhancing paraphrase-type generation using Direct Preference Optimization... |
|
Experimental |
| 114 |
MiuLab/DogeRM
The code used in the paper "DogeRM: Equipping Reward Models with Domain... |
|
Experimental |
| 115 |
YukinoshitaKaren/X_KDE
[ACL 2025 Findings] Edit Once, Update Everywhere: A Simple Framework for... |
|
Experimental |
| 116 |
nabeelshan78/reinforcement-learning-human-feedback-scratch
End-to-end implementation of Reinforcement Learning with Human Feedback... |
|
Experimental |
| 117 |
rasyosef/phi-2-sft-and-dpo
Notebooks to create an instruction following version of Microsoft's Phi 2... |
|
Experimental |
| 118 |
MilyaushaShamsutdinova/REINFORCE_research
REINFORCE w/ baseline algorithm implementation and exploration of its variation |
|
Experimental |
| 119 |
mahshid1378/Project-vLLM
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full... |
|
Experimental |
| 120 |
aditi-bhaskar/multiturn-20q
Multiturn RLHF applied to the 20 questions game through proxy rewards to... |
|
Experimental |
| 121 |
NotShrirang/PaliGemma
A Vision Language Model implemented in PyTorch |
|
Experimental |
| 122 |
Chinmaya-Kausik/RLHF-comparison
Comparing various RLHF methods |
|
Experimental |
| 123 |
thisarakaushan/Reinforcement-Learning-From-Human-Feedback
Understanding of Reinforcement Learning from Human Feedback (RLHF) and... |
|
Experimental |