eliahuhorwitz/Spectral-DeTuning

Official PyTorch Implementation for the "Recovering the Pre-Fine-Tuning Weights of Generative Models" paper (ICML 2024).

/ 100

Emerging

This tool helps AI security researchers and red teamers identify vulnerabilities in fine-tuned generative AI models. It takes multiple LoRA (Low-Rank Adaptation) fine-tuned models that originated from the same base model. The output is the recovered weights of the original, pre-fine-tuned source model, even if you don't have access to its low-rank decomposition.

No commits in the last 6 months.

Use this if you need to demonstrate or investigate how an attacker could recover the original, potentially unsafe, weights of a generative AI model after it has been fine-tuned for safety or other purposes.

Not ideal if you are looking to fine-tune models, optimize model performance, or perform standard model evaluation rather than a security analysis.

AI security red teaming model vulnerability assessment generative AI large language models

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

OptimalScale/LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

adithya-s-k/AI-Engineering.academy

Mastering Applied AI, One Concept at a Time

jax-ml/jax-llm-examples

Minimal yet performant LLM examples in pure JAX

young-geng/scalax

A simple library for scaling up JAX programs

riyanshibohra/TuneKit

Upload your data → Get a fine-tuned SLM. Free.

Explore Transformer Models

All categories Trending Transformer directory Insights