tomekkorbak/pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

/ 100

Emerging

This project helps machine learning engineers refine large language models (LLMs) to better align with specific human preferences, such as avoiding toxicity, personal information leaks, or code style violations. You provide training data annotated with 'misalignment scores,' and the project outputs a finetuned LLM that produces text more consistent with those preferences. It is designed for practitioners working on language model development and safety.

180 stars. No commits in the last 6 months.

Use this if you are a machine learning engineer looking to pretrain or finetune large language models to specifically reduce undesirable outputs based on human preferences, using metrics like toxicity, PII detection, or code style compliance.

Not ideal if you are an end-user simply looking for a ready-to-use, perfectly aligned language model without custom training or model development expertise.

Large Language Models NLP Safety Model Alignment AI Ethics Custom LLM Training

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

180

Forks

Language

Python

License

MIT

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights