eliahuhorwitz/Spectral-DeTuning

Official PyTorch Implementation for the "Recovering the Pre-Fine-Tuning Weights of Generative Models" paper (ICML 2024).

34
/ 100
Emerging

This tool helps AI security researchers and red teamers identify vulnerabilities in fine-tuned generative AI models. It takes multiple LoRA (Low-Rank Adaptation) fine-tuned models that originated from the same base model. The output is the recovered weights of the original, pre-fine-tuned source model, even if you don't have access to its low-rank decomposition.

No commits in the last 6 months.

Use this if you need to demonstrate or investigate how an attacker could recover the original, potentially unsafe, weights of a generative AI model after it has been fine-tuned for safety or other purposes.

Not ideal if you are looking to fine-tune models, optimize model performance, or perform standard model evaluation rather than a security analysis.

AI security red teaming model vulnerability assessment generative AI large language models
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 7 / 25

How are scores calculated?

Stars

85

Forks

4

Language

Python

License

Category

llm-fine-tuning

Last pushed

Apr 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/eliahuhorwitz/Spectral-DeTuning"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.