Direct Preference Optimization Transformer Models

There are 19 direct preference optimization models tracked. The highest-rated is stair-lab/mlhp at 49/100 with 30 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=direct-preference-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	stair-lab/mlhp Machine Learning from Human Preferences	49	Emerging	30	TeX
2	princeton-nlp/SimPO [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward	43	Emerging	946	Python
3	uclaml/SPPO The official implementation of Self-Play Preference Optimization (SPPO)	42	Emerging	583	Python
4	general-preference/general-preference-model [ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for...	36	Emerging	39	Python
5	sail-sg/dice Official implementation of Bootstrapping Language Models via DPO Implicit Rewards	33	Emerging	47	Python
6	line/sacpo [NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)	28	Experimental	8	Python
7	JIA-Lab-research/Step-DPO Implementation for "Step-DPO: Step-wise Preference Optimization for...	28	Experimental	392	Python
8	Meaquadddd/DPO-Shift DPO-Shift: Shifting the Distribution of Direct Preference Optimization	27	Experimental	59	Python
9	csm9493/efficient-llm-unlearning Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025)	27	Experimental	13	Python
10	li-plus/flash-preference Accelerate LLM preference tuning via prefix sharing with a single line of code	26	Experimental	51	Python
11	chrisliu298/llm-unlearn-eco [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts	25	Experimental	38	Python
12	sahsaeedi/TPO [TMLR] Triple Preference Optimization	23	Experimental	30	Python
13	sugarandgugu/Simple-Trl-Training 基于DPO算法微调语言大模型，简单好上手。	23	Experimental	51	Python
14	Rahulkumar010/microDPO microDPO: A minimalist, pure PyTorch implementation of Direct Preference...	23	Experimental	1	Python
15	yflyzhang/RankPO RankPO: Rank Preference Optimization	18	Experimental	2	Python
16	JIA-Lab-research/TGDPO [ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing...	14	Experimental	10	Python
17	codebywiam/fine-tuning-llm-dpo This project demonstrates how to fine-tune a GPT-2 model using Direct...	13	Experimental	—	Jupyter Notebook
18	molereddy/Alternate-Preference-Optimization [COLING 2025] code for "Alternate Preference Optimization for Unlearning...	13	Experimental	10	Python
19	mrunalmania/Direct-Preference-Optimization In this repo, I've implemented the LLM alignment technique known as DPO and...	10	Experimental	2	Python