itsvaibhav01/Immune
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
This project helps MLLM developers and researchers ensure their multimodal large language models (MLLMs) don't generate harmful content when faced with 'jailbreak' attacks. It takes an existing MLLM and image-prompt pairs, then processes them through an enhanced inference engine to produce safer, more ethical responses. MLLM developers and AI safety researchers are the primary users.
No commits in the last 6 months.
Use this if you are developing or deploying MLLMs and need to enhance their resilience against malicious prompts and visual inputs that attempt to bypass safety measures.
Not ideal if you are looking for a general-purpose MLLM or are not concerned with advanced adversarial safety mechanisms.
Stars
27
Forks
1
Language
Python
License
—
Category
Last pushed
Jun 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/itsvaibhav01/Immune"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...
tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
UCSB-NLP-Chang/SemanticSmooth
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic...
sigeisler/reinforce-attacks-llms
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and...
DAMO-NLP-SG/multilingual-safety-for-LLMs
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"