CarperAI/trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

/ 100

Emerging

This project helps AI engineers refine large language models (LLMs) to perform specific tasks better by incorporating human feedback or predefined reward signals. You provide an existing language model and either a way to score its outputs or examples with desired scores. The project then tunes the model so its future outputs align with these preferences, yielding a customized, high-performing LLM.

4,738 stars. No commits in the last 6 months.

Use this if you need to fine-tune a large language model to generate text that is more aligned with specific human preferences or a defined reward function, especially for models up to 20 billion parameters or larger with specialized hardware.

Not ideal if you are looking for an out-of-the-box solution that doesn't require deep technical knowledge of large language model training and distributed computing.

large-language-model-finetuning reinforcement-learning-for-nlp ai-model-customization natural-language-generation text-generation-refinement

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

4,738

Forks

482

Language

Python

License

MIT

Higher-rated alternatives

DLR-RM/stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

google-deepmind/dm_control

Google DeepMind's software stack for physics-based simulation and Reinforcement Learning...

Denys88/rl_games

RL implementations

pytorch/rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

yandexdataschool/Practical_RL

A course in reinforcement learning in the wild

Explore ML Frameworks

All categories Trending ML Framework directory Insights