erfanshayegani/Jailbreak-In-Pieces
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
This project helps evaluate the safety of vision-language models (VLMs) by testing their susceptibility to 'jailbreak' attacks. It takes an image and a text prompt as input, then generates an adversarial image designed to bypass the VLM's safety filters, causing it to respond to harmful or inappropriate requests. This tool is for AI safety researchers and red teamers who need to find and address vulnerabilities in multi-modal AI systems.
No commits in the last 6 months.
Use this if you are actively probing vision-language models for vulnerabilities and need to demonstrate how adversarial images combined with benign text can bypass safety mechanisms.
Not ideal if you are looking for a general tool to evaluate text-only large language models or to perform ethical content moderation.
Stars
80
Forks
5
Language
Python
License
MIT
Category
Last pushed
Jun 06, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/erfanshayegani/Jailbreak-In-Pieces"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
wuyoscar/ISC-Bench
Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...
yiksiu-chan/SpeakEasy
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...
tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"