yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

/ 100

Emerging

This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against harmful outputs. By inputting a set of queries and responses, it applies a novel 'Speak Easy' method, which leverages multi-step and multilingual interactions, to elicit harmful 'jailbreaks' more effectively. It outputs a 'HarmScore' to quantify the actionability and informativeness of these harmful responses, enabling a more precise assessment of an LLM's vulnerabilities.

Use this if you need to rigorously test how easily an LLM can be manipulated into generating unsafe or undesirable content, especially through nuanced interaction patterns.

Not ideal if you are looking for a general-purpose tool to filter or prevent harmful outputs in a production LLM system without needing to conduct detailed adversarial testing.

AI safety LLM red teaming model security harmful content detection adversarial testing

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Techiral/awesome-llm-jailbreaks

Latest AI Jailbreak Payloads & Exploit Techniques for GPT, QWEN, and all LLM Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights