CarperAI/trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

46
/ 100
Emerging

This project helps AI engineers refine large language models (LLMs) to perform specific tasks better by incorporating human feedback or predefined reward signals. You provide an existing language model and either a way to score its outputs or examples with desired scores. The project then tunes the model so its future outputs align with these preferences, yielding a customized, high-performing LLM.

4,738 stars. No commits in the last 6 months.

Use this if you need to fine-tune a large language model to generate text that is more aligned with specific human preferences or a defined reward function, especially for models up to 20 billion parameters or larger with specialized hardware.

Not ideal if you are looking for an out-of-the-box solution that doesn't require deep technical knowledge of large language model training and distributed computing.

large-language-model-finetuning reinforcement-learning-for-nlp ai-model-customization natural-language-generation text-generation-refinement
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

4,738

Forks

482

Language

Python

License

MIT

Last pushed

Jan 08, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/CarperAI/trlx"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.