tomekkorbak/pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

38
/ 100
Emerging

This project helps machine learning engineers refine large language models (LLMs) to better align with specific human preferences, such as avoiding toxicity, personal information leaks, or code style violations. You provide training data annotated with 'misalignment scores,' and the project outputs a finetuned LLM that produces text more consistent with those preferences. It is designed for practitioners working on language model development and safety.

180 stars. No commits in the last 6 months.

Use this if you are a machine learning engineer looking to pretrain or finetune large language models to specifically reduce undesirable outputs based on human preferences, using metrics like toxicity, PII detection, or code style compliance.

Not ideal if you are an end-user simply looking for a ready-to-use, perfectly aligned language model without custom training or model development expertise.

Large Language Models NLP Safety Model Alignment AI Ethics Custom LLM Training
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

180

Forks

14

Language

Python

License

MIT

Last pushed

Feb 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tomekkorbak/pretraining-with-human-feedback"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.