Direct Preference Optimization Transformer Models

There are 19 direct preference optimization models tracked. The highest-rated is stair-lab/mlhp at 49/100 with 30 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=direct-preference-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 stair-lab/mlhp

Machine Learning from Human Preferences

49
Emerging
2 princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

43
Emerging
3 uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

42
Emerging
4 general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for...

36
Emerging
5 sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

33
Emerging
6 line/sacpo

[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)

28
Experimental
7 JIA-Lab-research/Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for...

28
Experimental
8 Meaquadddd/DPO-Shift

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

27
Experimental
9 csm9493/efficient-llm-unlearning

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025)

27
Experimental
10 li-plus/flash-preference

Accelerate LLM preference tuning via prefix sharing with a single line of code

26
Experimental
11 chrisliu298/llm-unlearn-eco

[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts

25
Experimental
12 sahsaeedi/TPO

[TMLR] Triple Preference Optimization

23
Experimental
13 sugarandgugu/Simple-Trl-Training

基于DPO算法微调语言大模型,简单好上手。

23
Experimental
14 Rahulkumar010/microDPO

microDPO: A minimalist, pure PyTorch implementation of Direct Preference...

23
Experimental
15 yflyzhang/RankPO

RankPO: Rank Preference Optimization

18
Experimental
16 JIA-Lab-research/TGDPO

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing...

14
Experimental
17 codebywiam/fine-tuning-llm-dpo

This project demonstrates how to fine-tune a GPT-2 model using Direct...

13
Experimental
18 molereddy/Alternate-Preference-Optimization

[COLING 2025] code for "Alternate Preference Optimization for Unlearning...

13
Experimental
19 mrunalmania/Direct-Preference-Optimization

In this repo, I've implemented the LLM alignment technique known as DPO and...

10
Experimental