yiksiu-chan/SpeakEasy
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against harmful outputs. By inputting a set of queries and responses, it applies a novel 'Speak Easy' method, which leverages multi-step and multilingual interactions, to elicit harmful 'jailbreaks' more effectively. It outputs a 'HarmScore' to quantify the actionability and informativeness of these harmful responses, enabling a more precise assessment of an LLM's vulnerabilities.
Use this if you need to rigorously test how easily an LLM can be manipulated into generating unsafe or undesirable content, especially through nuanced interaction patterns.
Not ideal if you are looking for a general-purpose tool to filter or prevent harmful outputs in a production LLM system without needing to conduct detailed adversarial testing.
Stars
14
Forks
5
Language
Python
License
MIT
Category
Last pushed
Mar 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/yiksiu-chan/SpeakEasy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
wuyoscar/ISC-Bench
Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...
xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...
tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
Techiral/awesome-llm-jailbreaks
Latest AI Jailbreak Payloads & Exploit Techniques for GPT, QWEN, and all LLM Models