uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

/ 100

Emerging

This project helps large language model (LLM) developers enhance their models' performance without relying on external, costly feedback like GPT-4 evaluations. It takes an existing LLM and improves its ability to generate high-quality, aligned responses. The main users are researchers and engineers who develop and fine-tune LLMs.

583 stars. No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher looking to significantly improve the alignment and response quality of your large language models using an efficient self-play framework.

Not ideal if you are an end-user simply looking to apply an already optimized LLM without needing to perform the alignment process yourself.

large-language-model-development natural-language-processing model-alignment LLM-fine-tuning AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

583

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

line/sacpo

[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)

Explore Transformer Models

All categories Trending Transformer directory Insights