verazuo/jailbreak_llms

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

45
/ 100
Emerging

This project provides a comprehensive collection of over 15,000 real-world prompts, including 1,400+ 'jailbreak' prompts designed to bypass AI safety filters. It helps AI safety researchers and developers understand how users try to elicit harmful content from large language models. The input is a dataset of prompts, and the output is insights into common jailbreaking techniques and a structured question set for evaluation.

3,596 stars. No commits in the last 6 months.

Use this if you are an AI safety researcher, LLM developer, or policy maker seeking to analyze, understand, and defend against methods used to bypass large language model safeguards.

Not ideal if you are looking for a tool to generate harmful content or for general prompt engineering resources that do not focus on security vulnerabilities.

AI safety LLM security prompt engineering research content moderation harmful content detection
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

3,596

Forks

320

Language

Jupyter Notebook

License

MIT

Last pushed

Dec 24, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/verazuo/jailbreak_llms"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.