princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
This project helps large language model (LLM) developers fine-tune their models to better align with human preferences. It takes a base LLM and a dataset of preferred and dispreferred responses, then outputs a refined LLM that generates more helpful and higher-quality text. Data scientists and machine learning engineers responsible for deploying and improving conversational AI or text generation systems will find this useful.
946 stars. No commits in the last 6 months.
Use this if you need to optimize a large language model to produce outputs that consistently match human preferences for quality and helpfulness, especially when a simpler, more efficient approach is desired.
Not ideal if you are looking for a pre-trained, ready-to-use LLM for general tasks without custom fine-tuning or if you lack the technical expertise to work with model training frameworks.
Stars
946
Forks
73
Language
Python
License
MIT
Category
Last pushed
Feb 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/princeton-nlp/SimPO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stair-lab/mlhp
Machine Learning from Human Preferences
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
line/sacpo
[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)