princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

/ 100

Emerging

This project helps large language model (LLM) developers fine-tune their models to better align with human preferences. It takes a base LLM and a dataset of preferred and dispreferred responses, then outputs a refined LLM that generates more helpful and higher-quality text. Data scientists and machine learning engineers responsible for deploying and improving conversational AI or text generation systems will find this useful.

946 stars. No commits in the last 6 months.

Use this if you need to optimize a large language model to produce outputs that consistently match human preferences for quality and helpfulness, especially when a simpler, more efficient approach is desired.

Not ideal if you are looking for a pre-trained, ready-to-use LLM for general tasks without custom fine-tuning or if you lack the technical expertise to work with model training frameworks.

large-language-models conversational-ai-development model-fine-tuning natural-language-generation machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

946

Forks

Language

Python

License

MIT

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

line/sacpo

[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)

Explore Transformer Models

All categories Trending Transformer directory Insights