CarperAI/trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
This project helps AI engineers refine large language models (LLMs) to perform specific tasks better by incorporating human feedback or predefined reward signals. You provide an existing language model and either a way to score its outputs or examples with desired scores. The project then tunes the model so its future outputs align with these preferences, yielding a customized, high-performing LLM.
4,738 stars. No commits in the last 6 months.
Use this if you need to fine-tune a large language model to generate text that is more aligned with specific human preferences or a defined reward function, especially for models up to 20 billion parameters or larger with specialized hardware.
Not ideal if you are looking for an out-of-the-box solution that doesn't require deep technical knowledge of large language model training and distributed computing.
Stars
4,738
Forks
482
Language
Python
License
MIT
Category
Last pushed
Jan 08, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/CarperAI/trlx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DLR-RM/stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
google-deepmind/dm_control
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning...
Denys88/rl_games
RL implementations
pytorch/rl
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
yandexdataschool/Practical_RL
A course in reinforcement learning in the wild