sahsaeedi/TPO
[TMLR] Triple Preference Optimization
This project offers a method to significantly improve how large language models (LLMs) follow instructions and perform reasoning tasks. By starting with a pre-trained model and applying Triple Preference Optimization, you can generate an enhanced LLM that delivers better responses. It is intended for AI researchers and practitioners who are fine-tuning or developing custom LLMs.
No commits in the last 6 months.
Use this if you are an AI researcher or machine learning engineer looking to improve the instruction-following and reasoning capabilities of large language models efficiently.
Not ideal if you are an end-user simply looking to use an off-the-shelf LLM or if you lack the technical expertise to train and fine-tune models.
Stars
30
Forks
—
Language
Python
License
MIT
Category
Last pushed
Feb 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sahsaeedi/TPO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stair-lab/mlhp
Machine Learning from Human Preferences
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards