CryptoAILab/FigStep

[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts

/ 100

Emerging

This project helps evaluate the safety of large vision-language models (VLMs) by testing their susceptibility to 'jailbreaking' attacks. It takes crafted visual prompts (images with specific text) and benign text instructions as input. The output demonstrates how these VLMs might generate harmful content, even when given seemingly innocent text prompts. This tool is for AI safety researchers and developers who need to assess and improve the robustness of VLMs against misuse.

193 stars. No commits in the last 6 months.

Use this if you need to test the security and safety alignment of vision-language models against sophisticated visual and textual prompts that aim to bypass built-in safeguards.

Not ideal if you are looking for a general-purpose VLM for creative tasks or standard information retrieval, as its sole purpose is to expose model vulnerabilities.

AI-safety model-security red-teaming VLM-evaluation harmful-content-detection

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

193

Forks

Language

Python

License

MIT

Higher-rated alternatives

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...

yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Explore LLM Tools

All categories Trending LLM Tool directory Insights