RLHFlow/Online-RLHF

A recipe for online RLHF and online iterative DPO.

34
/ 100
Emerging

This project offers a method to enhance the performance of large language models (LLMs) by continuously learning from human feedback. It takes an existing LLM and human preference data (e.g., rankings of model responses) and produces a significantly improved LLM. This is for machine learning engineers or researchers focused on deploying and refining LLMs.

543 stars. No commits in the last 6 months.

Use this if you need to fine-tune an LLM beyond its initial training to better align with specific human preferences or complex instructions, especially for production environments requiring iterative improvements.

Not ideal if you are looking for a simple, off-the-shelf LLM for basic tasks without needing advanced customization or continuous performance optimization.

LLM Alignment Reinforcement Learning Model Fine-tuning AI Research Natural Language Processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 16 / 25

How are scores calculated?

Stars

543

Forks

48

Language

Python

License

Last pushed

Dec 28, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/RLHFlow/Online-RLHF"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.