Direct Preference Optimization Transformer Models
There are 19 direct preference optimization models tracked. The highest-rated is stair-lab/mlhp at 49/100 with 30 stars.
Get all 19 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=direct-preference-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
stair-lab/mlhp
Machine Learning from Human Preferences |
|
Emerging |
| 2 |
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward |
|
Emerging |
| 3 |
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO) |
|
Emerging |
| 4 |
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for... |
|
Emerging |
| 5 |
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards |
|
Emerging |
| 6 |
line/sacpo
[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization) |
|
Experimental |
| 7 |
JIA-Lab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for... |
|
Experimental |
| 8 |
Meaquadddd/DPO-Shift
DPO-Shift: Shifting the Distribution of Direct Preference Optimization |
|
Experimental |
| 9 |
csm9493/efficient-llm-unlearning
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025) |
|
Experimental |
| 10 |
li-plus/flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code |
|
Experimental |
| 11 |
chrisliu298/llm-unlearn-eco
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts |
|
Experimental |
| 12 |
sahsaeedi/TPO
[TMLR] Triple Preference Optimization |
|
Experimental |
| 13 |
sugarandgugu/Simple-Trl-Training
基于DPO算法微调语言大模型,简单好上手。 |
|
Experimental |
| 14 |
Rahulkumar010/microDPO
microDPO: A minimalist, pure PyTorch implementation of Direct Preference... |
|
Experimental |
| 15 |
yflyzhang/RankPO
RankPO: Rank Preference Optimization |
|
Experimental |
| 16 |
JIA-Lab-research/TGDPO
[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing... |
|
Experimental |
| 17 |
codebywiam/fine-tuning-llm-dpo
This project demonstrates how to fine-tune a GPT-2 model using Direct... |
|
Experimental |
| 18 |
molereddy/Alternate-Preference-Optimization
[COLING 2025] code for "Alternate Preference Optimization for Unlearning... |
|
Experimental |
| 19 |
mrunalmania/Direct-Preference-Optimization
In this repo, I've implemented the LLM alignment technique known as DPO and... |
|
Experimental |