ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
This project helps machine learning engineers and researchers improve the responses of large language models (LLMs) like GPT-2, LLaMA, and BLOOM. It takes existing LLM models and your preferred training data, then applies reinforcement learning from human feedback (RLHF) to refine the model's behavior. The output is a more accurate and nuanced LLM model tailored to your specific needs.
No commits in the last 6 months.
Use this if you are a machine learning engineer looking to fine-tune pre-trained large language models to produce more desirable and human-aligned responses for specific applications.
Not ideal if you are not familiar with machine learning concepts, model training, or command-line interfaces, as this is a technical tool for developers.
Stars
27
Forks
2
Language
Python
License
—
Category
Last pushed
Sep 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ssbuild/llm_rlhf"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.