itsvaibhav01/Immune

[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

/ 100

Experimental

This project helps MLLM developers and researchers ensure their multimodal large language models (MLLMs) don't generate harmful content when faced with 'jailbreak' attacks. It takes an existing MLLM and image-prompt pairs, then processes them through an enhanced inference engine to produce safer, more ethical responses. MLLM developers and AI safety researchers are the primary users.

No commits in the last 6 months.

Use this if you are developing or deploying MLLMs and need to enhance their resilience against malicious prompts and visual inputs that attempt to bypass safety measures.

Not ideal if you are looking for a general-purpose MLLM or are not concerned with advanced adversarial safety mechanisms.

AI Safety MLLM Development Adversarial Robustness Content Moderation Ethical AI

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

UCSB-NLP-Chang/SemanticSmooth

Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic...

sigeisler/reinforce-attacks-llms

REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and...

DAMO-NLP-SG/multilingual-safety-for-LLMs

[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"

Explore Transformer Models

All categories Trending Transformer directory Insights