erfanshayegani/Jailbreak-In-Pieces

[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

/ 100

Emerging

This project helps evaluate the safety of vision-language models (VLMs) by testing their susceptibility to 'jailbreak' attacks. It takes an image and a text prompt as input, then generates an adversarial image designed to bypass the VLM's safety filters, causing it to respond to harmful or inappropriate requests. This tool is for AI safety researchers and red teamers who need to find and address vulnerabilities in multi-modal AI systems.

No commits in the last 6 months.

Use this if you are actively probing vision-language models for vulnerabilities and need to demonstrate how adversarial images combined with benign text can bypass safety mechanisms.

Not ideal if you are looking for a general tool to evaluate text-only large language models or to perform ethical content moderation.

AI Safety Red Teaming Vulnerability Assessment Generative AI Evaluation Multimodal AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...

yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Explore LLM Tools

All categories Trending LLM Tool directory Insights