general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)
This project helps AI developers and researchers refine the quality of large language models (LLMs). It takes preference data, where humans have ranked or compared different LLM responses, and uses it to train a General Preference Model (GPM). The GPM then provides a more accurate way to evaluate and align LLMs, moving beyond simpler methods like Bradley-Terry models.
No commits in the last 6 months.
Use this if you are an AI developer or researcher looking to improve how you train and evaluate large language models based on human feedback.
Not ideal if you are a business user looking for a no-code solution to apply existing LLMs, as this is a tool for building and refining the underlying models.
Stars
39
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Sep 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/general-preference/general-preference-model"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stair-lab/mlhp
Machine Learning from Human Preferences
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
line/sacpo
[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)