xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
This project helps security researchers and red teamers evaluate the robustness of large language models (LLMs) against adversarial prompts. It takes a potentially harmful prompt and breaks it down, reconstructs it with subtle changes, and searches for synonyms to create 'jailbreak' prompts. The output is a highly effective adversarial prompt designed to bypass LLM safety mechanisms.
No commits in the last 6 months.
Use this if you are a security researcher or red teamer needing to rigorously test the safety alignment of LLMs like GPT-4, Gemini, or Llama2.
Not ideal if you are looking for a general-purpose LLM prompt engineering tool or a method to bypass ethical guidelines for malicious intent.
Stars
66
Forks
13
Language
JavaScript
License
MIT
Category
Last pushed
Aug 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/xirui-li/DrAttack"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
wuyoscar/ISC-Bench
Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...
yiksiu-chan/SpeakEasy
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
Techiral/awesome-llm-jailbreaks
Latest AI Jailbreak Payloads & Exploit Techniques for GPT, QWEN, and all LLM Models