RLHF Alignment Training Transformer Models

Tools and frameworks for training language models using reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and related alignment techniques. Includes implementations of RLHF pipelines, preference learning methods, and safety-focused training approaches. Does NOT include general safety evaluation, jailbreak detection, or post-hoc alignment analysis without training components.

There are 123 rlhf alignment training models tracked. 9 score above 50 (established tier). The highest-rated is agentscope-ai/Trinity-RFT at 69/100 with 557 stars. 3 of the top 10 are actively maintained.

Get all 123 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=rlhf-alignment-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed...

69
Established
2 OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on...

66
Established
3 zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

60
Established
4 huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

56
Established
5 hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

56
Established
6 PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

53
Established
7 PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from...

51
Established
8 opendilab/LightRFT

LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement...

51
Established
9 Gen-Verse/dLLM-RL

[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for...

50
Established
10 hscspring/hcgf

Humanable Chat Generative-model Fine-tuning | LLM微调

49
Emerging
11 conceptofmind/LaMDA-rlhf-pytorch

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding...

47
Emerging
12 sinanuozdemir/oreilly-llm-rl-alignment

This training offers an intensive exploration into the frontier of...

47
Emerging
13 hiyouga/ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

47
Emerging
14 NVlabs/RLP

[ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a...

46
Emerging
15 RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

46
Emerging
16 hiyouga/FastEdit

🩹Editing large language models within 10 seconds⚡

44
Emerging
17 OPTML-Group/Unlearn-Simple

[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative...

44
Emerging
18 uclaml/SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

44
Emerging
19 xyjigsaw/LLM-Pretrain-SFT

Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)

42
Emerging
20 tatsu-lab/alpaca_farm

A simulation framework for RLHF and alternatives. Develop your RLHF method...

42
Emerging
21 ZinYY/Online_RLHF

A PyTorch implementation of the paper "Provably Efficient Online RLHF with...

42
Emerging
22 nickduran/align2-linguistic-alignment

ALIGN 2.0: Modern Python package for multi-level linguistic alignment...

42
Emerging
23 pratyushasharma/laser

The Truth Is In There: Improving Reasoning in Language Models with...

41
Emerging
24 l294265421/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback)...

40
Emerging
25 WayneJin0918/SRUM

Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified...

39
Emerging
26 NVlabs/Long-RL

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

39
Emerging
27 WangJingyao07/Awesome-GRPO

Codebase of GRPO: Implementations and Resources of GRPO and Its Variants

39
Emerging
28 complex-reasoning/RPG

[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)

39
Emerging
29 nicola-decao/KnowledgeEditor

Code for Editing Factual Knowledge in Language Models

39
Emerging
30 jackaduma/Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer...

39
Emerging
31 openpsi-project/ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

38
Emerging
32 daniel-furman/sft-demos

Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and...

38
Emerging
33 rosinality/halite

Acceleration framework for Human Alignment Learning

38
Emerging
34 tomekkorbak/pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

38
Emerging
35 RishabSA/interp-refusal-tokens

We study whether categorical refusal tokens enable controllable and...

38
Emerging
36 HKUNLP/icl-ceil

[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.

37
Emerging
37 zjunlp/Mol-Instructions

[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset...

37
Emerging
38 jackaduma/ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer...

36
Emerging
39 abenechehab/dicl

[ICLR 2025] Official implementation of DICL (Disentangled In-Context...

36
Emerging
40 AIFrameResearch/SPO

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL...

36
Emerging
41 kaistAI/Janus

[NeurIPS 2024] Train LLMs with diverse system messages reflecting...

36
Emerging
42 tlc4418/llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

36
Emerging
43 TideDra/VL-RLHF

A RLHF Infrastructure for Vision-Language Models

35
Emerging
44 jackaduma/Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer...

35
Emerging
45 NVlabs/NFT

Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging...

35
Emerging
46 qizhou000/UniEdit

[NeurIPS 2025 B & D] UniEdit: A Unified Knowledge Editing Benchmark for...

35
Emerging
47 GithubX-F/DynaMO-RL

Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization...

34
Emerging
48 CLAIRE-Labo/quantile-reward-policy-optimization

Official codebase for "Quantile Reward Policy Optimization: Alignment with...

34
Emerging
49 ZJLAB-AMMI/LLM4Teach

Python code to implement LLM4Teach, a policy distillation approach for...

34
Emerging
50 RLHFlow/Online-RLHF

A recipe for online RLHF and online iterative DPO.

34
Emerging
51 PKU-Alignment/beavertails

BeaverTails is a collection of datasets designed to facilitate research on...

34
Emerging
52 holarissun/RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models...

34
Emerging
53 LunjunZhang/ema-pg

Code for "EMA Policy Gradient: Taming Reinforcement Learning for LLMs with...

33
Emerging
54 yaojin17/Unlearning_LLM

[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large...

33
Emerging
55 WooooDyy/BAPO

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for...

33
Emerging
56 YJiangcm/LTE

[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing

33
Emerging
57 CJReinforce/PURE

Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is...

32
Emerging
58 liziniu/policy_optimization

Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

31
Emerging
59 nlp-uoregon/Okapi

Okapi: Instruction-tuned Large Language Models in Multiple Languages with...

31
Emerging
60 NiuTrans/Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for...

31
Emerging
61 seonghyeonye/Flipped-Learning

[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models...

31
Emerging
62 twitter-research/multilingual-alignment-tpp

Code for reproducing the paper Improved Multilingual Language Model...

30
Emerging
63 ksm26/Reinforcement-Learning-from-Human-Feedback

Embark on the "Reinforcement Learning from Human Feedback" course and align...

30
Emerging
64 astorfi/LLM-Alignment-Project

A comprehensive template for aligning large language models (LLMs) using...

29
Experimental
65 liziniu/ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement...

29
Experimental
66 InternLM/Spark

An official implementation of "SPARK: Synergistic Policy And Reward...

28
Experimental
67 mintaywon/IF_RLHF

Source code for 'Understanding impacts of human feedback via influence functions'

28
Experimental
68 YukinoshitaKaren/Reason-KE

[EMNLP 2025 Findings] Robust Knowledge Editing via Explicit Reasoning Chains...

28
Experimental
69 Yellow4Submarine7/LLMDoctor

🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time...

27
Experimental
70 aerosta/rewardhackwatch

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1...

27
Experimental
71 li-plus/nanoRLHF

Train a tiny LLaMA model from scratch to repeat your words using...

27
Experimental
72 gao-g/prelude

Code for the paper "Aligning LLM Agents by Learning Latent Preference from...

27
Experimental
73 haozheji/exact-optimization

ICML 2024 - Official Repository for EXO: Towards Efficient Exact...

27
Experimental
74 pangatlo/RL-100

🤖 Implement advanced robotic manipulation techniques using real-world...

26
Experimental
75 wangclnlp/DeepSpeed-Chat-Extension

This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).

26
Experimental
76 Manohara-Ai/Reinforcement_Learning_Framework_to_Prevent_Jailbreaks

A reinforcement learning-based system designed to detect and prevent...

26
Experimental
77 RLHF-V/RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from...

25
Experimental
78 thinkwee/NOVER

[EMNLP-2025] R1-Zero on ANY TASK

24
Experimental
79 RUCKBReasoning/CodeRM

Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of...

24
Experimental
80 rafaelvp-db/hf-finetune

Fine tuning a GPT model using the Persuasion for Good dataset.

23
Experimental
81 5663015/LLMs_train

一套代码指令微调大模型

23
Experimental
82 yihedeng9/rlhf-summary-notes

A brief and partial summary of RLHF algorithms.

23
Experimental
83 ssbuild/llm_rlhf

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

22
Experimental
84 SharathHebbar/sft_mathgpt2

Supervised Fine tuning using TRL library

22
Experimental
85 bhimanbaghel/ResolveUnderOverEdit

Official implementation of "Resolving UnderEdit & OverEdit with Iterative &...

22
Experimental
86 clam004/minichatgpt

annotated tutorial of the huggingface TRL repo for reinforcement learning...

22
Experimental
87 VoxDroid/llm-wikipedia

A project for fine-tuning large language models (LLMs) on curated Wikipedia...

22
Experimental
88 pleiadian53/llm-lab

A research sandbox for LLM pretraining, fine-tuning (SFT, DPO, RLHF), and...

21
Experimental
89 sailik1991/deal

Decoding Time Alignment Search

21
Experimental
90 herbitovich/ai-alignment

Implementing the REINFORCE algorithm in the process of RLHF for LM alignment.

21
Experimental
91 PKU-Alignment/llms-resist-alignment

[ACL2025 Best Paper] Language Models Resist Alignment

21
Experimental
92 kylebrussell/cap-rlvr

CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning...

20
Experimental
93 313mystery303/vla0-trl

🔍 Explore a minimal reimplementation of VLA-0 with TRL, achieving 90% LIBERO...

20
Experimental
94 Dylsimple60/RLHF_learn

🤖 Enhance reinforcement learning stability and efficiency with advanced...

20
Experimental
95 ducnh279/Align-LLMs-with-DPO

Align a Large Language Model (LLM) with DPO loss

20
Experimental
96 Martin-qyma/TRM

From Faithfulness to Correctness: Generative Reward Models that Think Critically

20
Experimental
97 balnarendrasapa/faq-llm

This is course project for DSCI 6004 deals with fine-tuning a pretrained...

19
Experimental
98 sathishkumar67/GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO

Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a...

19
Experimental
99 rxian/domain-alignment

Code for importance-weighted domain alignment, and the paper “Cross-Lingual...

19
Experimental
100 Daddy-Myth/Fine-tuning-Flan-T5-RLHF

Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for...

18
Experimental
101 ma-spie/LLM_metaphor_detection

Repository for the paper "Literary Metaphor Detection with LLM Fine-Tuning...

18
Experimental
102 closestfriend/efficient-domain-adaptation

Research repository for Brie: LLM-assisted data authoring methodology...

16
Experimental
103 DolbyUUU/DeepEnlighten

Pure RL to post-train base models for social reasoning capabilities....

15
Experimental
104 SafeRL-Lab/TeaMs-RL

[TMLR] TeaMs-RL: Teaching LLMs to Generate Better Instruction Datasets via...

14
Experimental
105 Yousifus/rlhf_loop_humain

RLHF Loop System - Learning project with monitoring dashboard, drift...

14
Experimental
106 fake-it0628/jailbreak-defense

Jailbreak Defense System based on Hidden State Causal Monitoring for LLMs

14
Experimental
107 liziniu/cold_start_rl

Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?

14
Experimental
108 kantkrishan0206-crypto/AlignGPT

“This project implements a mini LLM alignment pipeline using Reinforcement...

14
Experimental
109 DanielSc4/RewardLM

Reward a Language Model with pancakes 🥞

13
Experimental
110 pradeepiyer/nothing-gpt

SFT + DPO fine tuned model about Nothing.

13
Experimental
111 Jason-Wang313/Drift-Bench

Quantifying the "Safety Half-Life" of LLMs: A framework to measure how...

13
Experimental
112 fabiantoh98/llm-preference-learning

End-to-end LLM preference learning pipeline: training, evaluation, and...

13
Experimental
113 cluebbers/dpo-rlhf-paraphrase-types

Enhancing paraphrase-type generation using Direct Preference Optimization...

13
Experimental
114 MiuLab/DogeRM

The code used in the paper "DogeRM: Equipping Reward Models with Domain...

12
Experimental
115 YukinoshitaKaren/X_KDE

[ACL 2025 Findings] Edit Once, Update Everywhere: A Simple Framework for...

12
Experimental
116 nabeelshan78/reinforcement-learning-human-feedback-scratch

End-to-end implementation of Reinforcement Learning with Human Feedback...

11
Experimental
117 rasyosef/phi-2-sft-and-dpo

Notebooks to create an instruction following version of Microsoft's Phi 2...

11
Experimental
118 MilyaushaShamsutdinova/REINFORCE_research

REINFORCE w/ baseline algorithm implementation and exploration of its variation

11
Experimental
119 mahshid1378/Project-vLLM

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full...

11
Experimental
120 aditi-bhaskar/multiturn-20q

Multiturn RLHF applied to the 20 questions game through proxy rewards to...

11
Experimental
121 NotShrirang/PaliGemma

A Vision Language Model implemented in PyTorch

11
Experimental
122 Chinmaya-Kausik/RLHF-comparison

Comparing various RLHF methods

11
Experimental
123 thisarakaushan/Reinforcement-Learning-From-Human-Feedback

Understanding of Reinforcement Learning from Human Feedback (RLHF) and...

10
Experimental

Comparisons in this category