xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

42
/ 100
Emerging

This project helps security researchers and red teamers evaluate the robustness of large language models (LLMs) against adversarial prompts. It takes a potentially harmful prompt and breaks it down, reconstructs it with subtle changes, and searches for synonyms to create 'jailbreak' prompts. The output is a highly effective adversarial prompt designed to bypass LLM safety mechanisms.

No commits in the last 6 months.

Use this if you are a security researcher or red teamer needing to rigorously test the safety alignment of LLMs like GPT-4, Gemini, or Llama2.

Not ideal if you are looking for a general-purpose LLM prompt engineering tool or a method to bypass ethical guidelines for malicious intent.

LLM security red teaming prompt vulnerability AI safety evaluation adversarial testing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

66

Forks

13

Language

JavaScript

License

MIT

Last pushed

Aug 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/xirui-li/DrAttack"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.