l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
This project helps fine-tune large language models (LLMs) like LLaMA to better align with human preferences. It takes an existing LLaMA model and human feedback data, and outputs a more helpful and harmless conversational AI model. This is for AI developers, researchers, and engineers working on building or improving custom chat applications and intelligent assistants.
117 stars. No commits in the last 6 months.
Use this if you want to enhance the quality and safety of a LLaMA-based conversational AI by incorporating human feedback during the training process.
Not ideal if you are looking for a pre-trained, ready-to-use chatbot without needing to perform custom model training and finetuning.
Stars
117
Forks
15
Language
Python
License
MIT
Category
Last pushed
Jun 05, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/l294265421/alpaca-rlhf"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.