declare-lab/resta

Restore safety in fine-tuned language models through task arithmetic

19
/ 100
Experimental

This project helps machine learning engineers and researchers make their fine-tuned language models safer and less likely to generate harmful content. It takes a language model that has been fine-tuned for a specific task but may have lost some of its safety alignment, and by adding a 'safety vector,' produces a new version that maintains its task performance while significantly reducing harmful outputs. This is for professionals building or deploying custom Large Language Models.

No commits in the last 6 months.

Use this if you have fine-tuned a large language model for a specific task and are concerned that it now generates unsafe, biased, or harmful responses.

Not ideal if you are looking for a pre-built, ready-to-use safe language model rather than a method to improve the safety of your own custom models.

AI Safety Large Language Models Model Alignment NLP Engineering Content Moderation
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 4 / 25

How are scores calculated?

Stars

32

Forks

1

Language

Python

License

Last pushed

Mar 28, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/declare-lab/resta"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.