sahsaeedi/TPO

[TMLR] Triple Preference Optimization

/ 100

Experimental

This project offers a method to significantly improve how large language models (LLMs) follow instructions and perform reasoning tasks. By starting with a pre-trained model and applying Triple Preference Optimization, you can generate an enhanced LLM that delivers better responses. It is intended for AI researchers and practitioners who are fine-tuning or developing custom LLMs.

No commits in the last 6 months.

Use this if you are an AI researcher or machine learning engineer looking to improve the instruction-following and reasoning capabilities of large language models efficiently.

Not ideal if you are an end-user simply looking to use an off-the-shelf LLM or if you lack the technical expertise to train and fine-tune models.

Large Language Models Model Fine-tuning AI Alignment Machine Learning Research Natural Language Processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights