holarissun/RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives

34
/ 100
Emerging

This project helps AI researchers and practitioners develop and test reward models for large language models. It provides a way to train and evaluate reward models using pre-generated embedding data, which dramatically reduces the need for expensive GPUs. Input is a dataset of language model responses and their associated quality annotations, and the output is a trained reward model that can assess response quality efficiently.

No commits in the last 6 months.

Use this if you are an AI researcher or practitioner looking to conduct reward modeling research for large language models without needing high-end GPUs for training and evaluation.

Not ideal if you are looking to generate new response data or annotations from scratch, as those steps still require significant computational resources like GPUs.

AI research large language models reinforcement learning from human feedback reward modeling natural language processing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

71

Forks

5

Language

Python

License

MIT

Last pushed

Apr 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/holarissun/RewardModelingBeyondBradleyTerry"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.