erfanshayegani/Jailbreak-In-Pieces

[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

33
/ 100
Emerging

This project helps evaluate the safety of vision-language models (VLMs) by testing their susceptibility to 'jailbreak' attacks. It takes an image and a text prompt as input, then generates an adversarial image designed to bypass the VLM's safety filters, causing it to respond to harmful or inappropriate requests. This tool is for AI safety researchers and red teamers who need to find and address vulnerabilities in multi-modal AI systems.

No commits in the last 6 months.

Use this if you are actively probing vision-language models for vulnerabilities and need to demonstrate how adversarial images combined with benign text can bypass safety mechanisms.

Not ideal if you are looking for a general tool to evaluate text-only large language models or to perform ethical content moderation.

AI Safety Red Teaming Vulnerability Assessment Generative AI Evaluation Multimodal AI
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

80

Forks

5

Language

Python

License

MIT

Last pushed

Jun 06, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/erfanshayegani/Jailbreak-In-Pieces"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.