tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
This project helps machine learning engineers refine large language models (LLMs) to better align with specific human preferences, such as avoiding toxicity, personal information leaks, or code style violations. You provide training data annotated with 'misalignment scores,' and the project outputs a finetuned LLM that produces text more consistent with those preferences. It is designed for practitioners working on language model development and safety.
180 stars. No commits in the last 6 months.
Use this if you are a machine learning engineer looking to pretrain or finetune large language models to specifically reduce undesirable outputs based on human preferences, using metrics like toxicity, PII detection, or code style compliance.
Not ideal if you are an end-user simply looking for a ready-to-use, perfectly aligned language model without custom training or model development expertise.
Stars
180
Forks
14
Language
Python
License
MIT
Category
Last pushed
Feb 13, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tomekkorbak/pretraining-with-human-feedback"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.