wuyoscar/ISC-Bench
Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".
This project helps AI safety researchers and red teamers evaluate how Large Language Models (LLMs) might generate harmful content not from direct attacks, but from completing common, sensitive professional tasks. You provide a workflow template to an LLM, and the project helps you identify and document when the model's helpfulness accidentally leads to unsafe outputs. The primary users are professionals focused on responsible AI development and auditing.
677 stars. Actively maintained with 337 commits in the last 30 days.
Use this if you need to test the inherent safety vulnerabilities of LLMs when performing routine, sensitive tasks, rather than through explicit 'jailbreak' prompts.
Not ideal if you are looking for tools to deliberately bypass LLM safety features for malicious purposes or real-world harm, as this project is strictly for academic safety research.
Stars
677
Forks
127
Language
Python
License
—
Category
Last pushed
Mar 28, 2026
Commits (30d)
337
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wuyoscar/ISC-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...
yiksiu-chan/SpeakEasy
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...
tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
Techiral/awesome-llm-jailbreaks
Latest AI Jailbreak Payloads & Exploit Techniques for GPT, QWEN, and all LLM Models