yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

46
/ 100
Emerging

This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against harmful outputs. By inputting a set of queries and responses, it applies a novel 'Speak Easy' method, which leverages multi-step and multilingual interactions, to elicit harmful 'jailbreaks' more effectively. It outputs a 'HarmScore' to quantify the actionability and informativeness of these harmful responses, enabling a more precise assessment of an LLM's vulnerabilities.

Use this if you need to rigorously test how easily an LLM can be manipulated into generating unsafe or undesirable content, especially through nuanced interaction patterns.

Not ideal if you are looking for a general-purpose tool to filter or prevent harmful outputs in a production LLM system without needing to conduct detailed adversarial testing.

AI safety LLM red teaming model security harmful content detection adversarial testing
No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

14

Forks

5

Language

Python

License

MIT

Last pushed

Mar 07, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/yiksiu-chan/SpeakEasy"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.