RLHF Alignment Training Transformer Models

Tools and frameworks for training language models using reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and related alignment techniques. Includes implementations of RLHF pipelines, preference learning methods, and safety-focused training approaches. Does NOT include general safety evaluation, jailbreak detection, or post-hoc alignment analysis without training components.

There are 123 rlhf alignment training models tracked. 9 score above 50 (established tier). The highest-rated is agentscope-ai/Trinity-RFT at 69/100 with 557 stars. 3 of the top 10 are actively maintained.

Get all 123 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=rlhf-alignment-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	agentscope-ai/Trinity-RFT Trinity-RFT is a general-purpose, flexible and scalable framework designed...	69	Established	557	Python
2	OpenRLHF/OpenRLHF An Easy-to-use, Scalable and High-performance Agentic RL Framework based on...	66	Established	9,158	Python
3	zjunlp/EasyEdit [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.	60	Established	2,744	Jupyter Notebook
4	huggingface/alignment-handbook Robust recipes to align language models with human and AI preferences	56	Established	5,523	Python
5	hyunwoongko/nanoRLHF nanoRLHF: from-scratch journey into how LLMs and RLHF really work.	56	Established	168	Python
6	PKU-Alignment/align-anything Align Anything: Training All-modality Model with Feedback	53	Established	4,635	Python
7	PKU-Alignment/safe-rlhf Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from...	51	Established	1,590	Python
8	opendilab/LightRFT LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement...	51	Established	208	Python
9	Gen-Verse/dLLM-RL [ICLR 2026] Official code for TraceRL: Revolutionizing post-training for...	50	Established	459	Python
10	hscspring/hcgf Humanable Chat Generative-model Fine-tuning \| LLM微调	49	Emerging	207	Python
11	conceptofmind/LaMDA-rlhf-pytorch Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding...	47	Emerging	470	Python
12	sinanuozdemir/oreilly-llm-rl-alignment This training offers an intensive exploration into the frontier of...	47	Emerging	59	Jupyter Notebook
13	hiyouga/ChatGLM-Efficient-Tuning Fine-tuning ChatGLM-6B with PEFT \| 基于 PEFT 的高效 ChatGLM 微调	47	Emerging	3,732	Python
14	NVlabs/RLP [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a...	46	Emerging	241	—
15	RLHFlow/RLHF-Reward-Modeling Recipes to train reward model for RLHF.	46	Emerging	1,520	Python
16	hiyouga/FastEdit 🩹Editing large language models within 10 seconds⚡	44	Emerging	1,359	Python
17	OPTML-Group/Unlearn-Simple [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative...	44	Emerging	43	Python
18	uclaml/SPIN The official implementation of Self-Play Fine-Tuning (SPIN)	44	Emerging	1,235	Python
19	xyjigsaw/LLM-Pretrain-SFT Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)	42	Emerging	87	Python
20	tatsu-lab/alpaca_farm A simulation framework for RLHF and alternatives. Develop your RLHF method...	42	Emerging	842	Python
21	ZinYY/Online_RLHF A PyTorch implementation of the paper "Provably Efficient Online RLHF with...	42	Emerging	89	Python
22	nickduran/align2-linguistic-alignment ALIGN 2.0: Modern Python package for multi-level linguistic alignment...	42	Emerging	4	Python
23	pratyushasharma/laser The Truth Is In There: Improving Reasoning in Language Models with...	41	Emerging	390	Python
24	l294265421/alpaca-rlhf Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback)...	40	Emerging	117	Python
25	WayneJin0918/SRUM Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified...	39	Emerging	96	Python
26	NVlabs/Long-RL Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)	39	Emerging	700	Python
27	WangJingyao07/Awesome-GRPO Codebase of GRPO: Implementations and Resources of GRPO and Its Variants	39	Emerging	276	Python
28	complex-reasoning/RPG [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)	39	Emerging	65	Python
29	nicola-decao/KnowledgeEditor Code for Editing Factual Knowledge in Language Models	39	Emerging	142	Python
30	jackaduma/Vicuna-LoRA-RLHF-PyTorch A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer...	39	Emerging	221	Python
31	openpsi-project/ReaLHF Super-Efficient RLHF Training of LLMs with Parameter Reallocation	38	Emerging	333	Python
32	daniel-furman/sft-demos Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and...	38	Emerging	77	Jupyter Notebook
33	rosinality/halite Acceleration framework for Human Alignment Learning	38	Emerging	13	Python
34	tomekkorbak/pretraining-with-human-feedback Code accompanying the paper Pretraining Language Models with Human Preferences	38	Emerging	180	Python
35	RishabSA/interp-refusal-tokens We study whether categorical refusal tokens enable controllable and...	38	Emerging	7	Python
36	HKUNLP/icl-ceil [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.	37	Emerging	103	Python
37	zjunlp/Mol-Instructions [ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset...	37	Emerging	294	Python
38	jackaduma/ChatGLM-LoRA-RLHF-PyTorch A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer...	36	Emerging	140	Python
39	abenechehab/dicl [ICLR 2025] Official implementation of DICL (Disentangled In-Context...	36	Emerging	25	Jupyter Notebook
40	AIFrameResearch/SPO Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL...	36	Emerging	45	Python
41	kaistAI/Janus [NeurIPS 2024] Train LLMs with diverse system messages reflecting...	36	Emerging	53	Python
42	tlc4418/llm_optimization A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.	36	Emerging	47	Python
43	TideDra/VL-RLHF A RLHF Infrastructure for Vision-Language Models	35	Emerging	198	Python
44	jackaduma/Alpaca-LoRA-RLHF-PyTorch A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer...	35	Emerging	61	Python
45	NVlabs/NFT Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging...	35	Emerging	71	Python
46	qizhou000/UniEdit [NeurIPS 2025 B & D] UniEdit: A Unified Knowledge Editing Benchmark for...	35	Emerging	2	Python
47	GithubX-F/DynaMO-RL Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization...	34	Emerging	86	Python
48	CLAIRE-Labo/quantile-reward-policy-optimization Official codebase for "Quantile Reward Policy Optimization: Alignment with...	34	Emerging	30	Python
49	ZJLAB-AMMI/LLM4Teach Python code to implement LLM4Teach, a policy distillation approach for...	34	Emerging	53	Python
50	RLHFlow/Online-RLHF A recipe for online RLHF and online iterative DPO.	34	Emerging	543	Python
51	PKU-Alignment/beavertails BeaverTails is a collection of datasets designed to facilitate research on...	34	Emerging	176	Makefile
52	holarissun/RewardModelingBeyondBradleyTerry official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models...	34	Emerging	71	Python
53	LunjunZhang/ema-pg Code for "EMA Policy Gradient: Taming Reinforcement Learning for LLMs with...	33	Emerging	8	Python
54	yaojin17/Unlearning_LLM [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large...	33	Emerging	66	Python
55	WooooDyy/BAPO Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for...	33	Emerging	91	Python
56	YJiangcm/LTE [ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing	33	Emerging	37	Python
57	CJReinforce/PURE Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is...	32	Emerging	160	Python
58	liziniu/policy_optimization Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)	31	Emerging	28	Python
59	nlp-uoregon/Okapi Okapi: Instruction-tuned Large Language Models in Multiple Languages with...	31	Emerging	96	Python
60	NiuTrans/Vision-LLM-Alignment This repository contains the code for SFT, RLHF, and DPO, designed for...	31	Emerging	118	Python
61	seonghyeonye/Flipped-Learning [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models...	31	Emerging	117	Python
62	twitter-research/multilingual-alignment-tpp Code for reproducing the paper Improved Multilingual Language Model...	30	Emerging	2	Jupyter Notebook
63	ksm26/Reinforcement-Learning-from-Human-Feedback Embark on the "Reinforcement Learning from Human Feedback" course and align...	30	Emerging	12	Jupyter Notebook
64	astorfi/LLM-Alignment-Project A comprehensive template for aligning large language models (LLMs) using...	29	Experimental	39	Python
65	liziniu/ReMax Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement...	29	Experimental	201	Python
66	InternLM/Spark An official implementation of "SPARK: Synergistic Policy And Reward...	28	Experimental	25	Python
67	mintaywon/IF_RLHF Source code for 'Understanding impacts of human feedback via influence functions'	28	Experimental	10	Python
68	YukinoshitaKaren/Reason-KE [EMNLP 2025 Findings] Robust Knowledge Editing via Explicit Reasoning Chains...	28	Experimental	3	Python
69	Yellow4Submarine7/LLMDoctor 🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time...	27	Experimental	2	Python
70	aerosta/rewardhackwatch Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1...	27	Experimental	7	Python
71	li-plus/nanoRLHF Train a tiny LLaMA model from scratch to repeat your words using...	27	Experimental	18	Python
72	gao-g/prelude Code for the paper "Aligning LLM Agents by Learning Latent Preference from...	27	Experimental	45	Python
73	haozheji/exact-optimization ICML 2024 - Official Repository for EXO: Towards Efficient Exact...	27	Experimental	56	Python
74	pangatlo/RL-100 🤖 Implement advanced robotic manipulation techniques using real-world...	26	Experimental	3	Python
75	wangclnlp/DeepSpeed-Chat-Extension This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).	26	Experimental	21	Python
76	Manohara-Ai/Reinforcement_Learning_Framework_to_Prevent_Jailbreaks A reinforcement learning-based system designed to detect and prevent...	26	Experimental	1	Python
77	RLHF-V/RLHF-V [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from...	25	Experimental	307	Python
78	thinkwee/NOVER [EMNLP-2025] R1-Zero on ANY TASK	24	Experimental	28	Python
79	RUCKBReasoning/CodeRM Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of...	24	Experimental	27	Python
80	rafaelvp-db/hf-finetune Fine tuning a GPT model using the Persuasion for Good dataset.	23	Experimental	3	Python
81	5663015/LLMs_train 一套代码指令微调大模型	23	Experimental	39	Python
82	yihedeng9/rlhf-summary-notes A brief and partial summary of RLHF algorithms.	23	Experimental	147	—
83	ssbuild/llm_rlhf realize the reinforcement learning training for gpt2 llama bloom and so on llm model	22	Experimental	27	Python
84	SharathHebbar/sft_mathgpt2 Supervised Fine tuning using TRL library	22	Experimental	2	Jupyter Notebook
85	bhimanbaghel/ResolveUnderOverEdit Official implementation of "Resolving UnderEdit & OverEdit with Iterative &...	22	Experimental	1	Python
86	clam004/minichatgpt annotated tutorial of the huggingface TRL repo for reinforcement learning...	22	Experimental	20	Jupyter Notebook
87	VoxDroid/llm-wikipedia A project for fine-tuning large language models (LLMs) on curated Wikipedia...	22	Experimental	3	Jupyter Notebook
88	pleiadian53/llm-lab A research sandbox for LLM pretraining, fine-tuning (SFT, DPO, RLHF), and...	21	Experimental	—	Python
89	sailik1991/deal Decoding Time Alignment Search	21	Experimental	—	Python
90	herbitovich/ai-alignment Implementing the REINFORCE algorithm in the process of RLHF for LM alignment.	21	Experimental	—	Jupyter Notebook
91	PKU-Alignment/llms-resist-alignment [ACL2025 Best Paper] Language Models Resist Alignment	21	Experimental	44	Python
92	kylebrussell/cap-rlvr CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning...	20	Experimental	3	Python
93	313mystery303/vla0-trl 🔍 Explore a minimal reimplementation of VLA-0 with TRL, achieving 90% LIBERO...	20	Experimental	3	Python
94	Dylsimple60/RLHF_learn 🤖 Enhance reinforcement learning stability and efficiency with advanced...	20	Experimental	4	Python
95	ducnh279/Align-LLMs-with-DPO Align a Large Language Model (LLM) with DPO loss	20	Experimental	8	Jupyter Notebook
96	Martin-qyma/TRM From Faithfulness to Correctness: Generative Reward Models that Think Critically	20	Experimental	14	Python
97	balnarendrasapa/faq-llm This is course project for DSCI 6004 deals with fine-tuning a pretrained...	19	Experimental	4	Jupyter Notebook
98	sathishkumar67/GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a...	19	Experimental	4	Python
99	rxian/domain-alignment Code for importance-weighted domain alignment, and the paper “Cross-Lingual...	19	Experimental	3	Python
100	Daddy-Myth/Fine-tuning-Flan-T5-RLHF Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for...	18	Experimental	1	Jupyter Notebook
101	ma-spie/LLM_metaphor_detection Repository for the paper "Literary Metaphor Detection with LLM Fine-Tuning...	18	Experimental	2	Jupyter Notebook
102	closestfriend/efficient-domain-adaptation Research repository for Brie: LLM-assisted data authoring methodology...	16	Experimental	1	Python
103	DolbyUUU/DeepEnlighten Pure RL to post-train base models for social reasoning capabilities....	15	Experimental	39	Python
104	SafeRL-Lab/TeaMs-RL [TMLR] TeaMs-RL: Teaching LLMs to Generate Better Instruction Datasets via...	14	Experimental	6	Python
105	Yousifus/rlhf_loop_humain RLHF Loop System - Learning project with monitoring dashboard, drift...	14	Experimental	1	Python
106	fake-it0628/jailbreak-defense Jailbreak Defense System based on Hidden State Causal Monitoring for LLMs	14	Experimental	1	Python
107	liziniu/cold_start_rl Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?	14	Experimental	19	Python
108	kantkrishan0206-crypto/AlignGPT “This project implements a mini LLM alignment pipeline using Reinforcement...	14	Experimental	1	Jupyter Notebook
109	DanielSc4/RewardLM Reward a Language Model with pancakes 🥞	13	Experimental	12	Jupyter Notebook
110	pradeepiyer/nothing-gpt SFT + DPO fine tuned model about Nothing.	13	Experimental	—	Python
111	Jason-Wang313/Drift-Bench Quantifying the "Safety Half-Life" of LLMs: A framework to measure how...	13	Experimental	—	Python
112	fabiantoh98/llm-preference-learning End-to-end LLM preference learning pipeline: training, evaluation, and...	13	Experimental	—	Python
113	cluebbers/dpo-rlhf-paraphrase-types Enhancing paraphrase-type generation using Direct Preference Optimization...	13	Experimental	—	Jupyter Notebook
114	MiuLab/DogeRM The code used in the paper "DogeRM: Equipping Reward Models with Domain...	12	Experimental	6	Python
115	YukinoshitaKaren/X_KDE [ACL 2025 Findings] Edit Once, Update Everywhere: A Simple Framework for...	12	Experimental	3	Python
116	nabeelshan78/reinforcement-learning-human-feedback-scratch End-to-end implementation of Reinforcement Learning with Human Feedback...	11	Experimental	2	Jupyter Notebook
117	rasyosef/phi-2-sft-and-dpo Notebooks to create an instruction following version of Microsoft's Phi 2...	11	Experimental	3	Jupyter Notebook
118	MilyaushaShamsutdinova/REINFORCE_research REINFORCE w/ baseline algorithm implementation and exploration of its variation	11	Experimental	—	Jupyter Notebook
119	mahshid1378/Project-vLLM An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full...	11	Experimental	—	Python
120	aditi-bhaskar/multiturn-20q Multiturn RLHF applied to the 20 questions game through proxy rewards to...	11	Experimental	—	Jupyter Notebook
121	NotShrirang/PaliGemma A Vision Language Model implemented in PyTorch	11	Experimental	—	Python
122	Chinmaya-Kausik/RLHF-comparison Comparing various RLHF methods	11	Experimental	—	Jupyter Notebook
123	thisarakaushan/Reinforcement-Learning-From-Human-Feedback Understanding of Reinforcement Learning from Human Feedback (RLHF) and...	10	Experimental	2	HTML

Comparisons in this category

Trinity-RFT and LightRFT (69 vs 51)