OPTML-Group/Diffusion-MU-Attack

The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effective attack method to evaluate the harmful-content generation ability of safety-driven unlearned diffusion models.

/ 100

Emerging

This project helps AI safety researchers and model developers assess how effectively their 'safety-driven unlearned' image generation models have forgotten unwanted concepts like nudity or specific styles. It takes an unlearned diffusion model and generates a set of adversarial prompts, then measures how frequently the model still produces unsafe or undesirable images. The end-user is typically an AI ethics researcher, a machine learning engineer focusing on model safety, or a product manager responsible for the ethical deployment of generative AI.

No commits in the last 6 months.

Use this if you need to rigorously test the robustness of your unlearned diffusion models against attempts to generate harmful or forgotten content.

Not ideal if you are looking to unlearn concepts from a diffusion model, as this tool is for evaluating the effectiveness of existing unlearning methods.

AI-safety responsible-AI generative-model-evaluation content-moderation machine-unlearning

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

OPTML-Group/Unlearn-Saliency

[ICLR24 (Spotlight)] "SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in...

Shilin-LU/VINE

[ICLR 2025] "Robust Watermarking Using Generative Priors Against Image Editing: From...

WindVChen/DiffAttack

An unrestricted attack based on diffusion models that can achieve both good transferability and...

koninik/DiffusionPen

Official PyTorch Implementation of "DiffusionPen: Towards Controlling the Style of Handwritten...

Wuyxin/DISC

(ICML 2023) Discover and Cure: Concept-aware Mitigation of Spurious Correlation

Explore Diffusion Models

All categories Trending Diffusion directory Insights