RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
This project offers a method to enhance the performance of large language models (LLMs) by continuously learning from human feedback. It takes an existing LLM and human preference data (e.g., rankings of model responses) and produces a significantly improved LLM. This is for machine learning engineers or researchers focused on deploying and refining LLMs.
543 stars. No commits in the last 6 months.
Use this if you need to fine-tune an LLM beyond its initial training to better align with specific human preferences or complex instructions, especially for production environments requiring iterative improvements.
Not ideal if you are looking for a simple, off-the-shelf LLM for basic tasks without needing advanced customization or continuous performance optimization.
Stars
543
Forks
48
Language
Python
License
—
Category
Last pushed
Dec 28, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/RLHFlow/Online-RLHF"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.