Meaquadddd/DPO-Shift

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

27
/ 100
Experimental

This project offers a method to improve how Large Language Models (LLMs) are fine-tuned using preference data. It takes an existing SFT (Supervised Fine-Tuned) model and preference datasets, then applies a new training strategy to produce a DPO-Shifted model that generates more favored responses. This is for machine learning engineers and researchers who are building and optimizing LLMs.

No commits in the last 6 months.

Use this if you are fine-tuning an LLM with Direct Preference Optimization (DPO) and want to address issues where the model's preferred responses decrease in probability during training.

Not ideal if you are looking for a ready-to-use LLM without needing to engage in the fine-tuning process, or if you are not familiar with DPO and LLM training pipelines.

LLM fine-tuning preference optimization natural language processing model alignment generative AI
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 11 / 25

How are scores calculated?

Stars

59

Forks

6

Language

Python

License

Last pushed

Mar 05, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Meaquadddd/DPO-Shift"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.