git-disl/Lisa

This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)

23
/ 100
Experimental

This project helps machine learning engineers and researchers make large language models (LLMs) safer by defending against harmful fine-tuning. It takes a pre-trained LLM and a dataset for fine-tuning, then processes them to produce a fine-tuned LLM that is more resistant to generating unsafe content, even if trained on malicious data. The primary users are those responsible for deploying and maintaining safe AI systems.

No commits in the last 6 months.

Use this if you are a machine learning engineer concerned about your large language models being manipulated by harmful fine-tuning data.

Not ideal if you are looking for a defense against harmful content at the pre-training or post-fine-tuning stages, as this tool specifically targets the fine-tuning process.

AI safety large language models model fine-tuning responsible AI AI security
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

26

Forks

Language

Python

License

Apache-2.0

Last pushed

Sep 10, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/git-disl/Lisa"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.