wuyoscar/ISC-Bench

Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".

/ 100

Verified

This project helps AI safety researchers and red teamers evaluate how Large Language Models (LLMs) might generate harmful content not from direct attacks, but from completing common, sensitive professional tasks. You provide a workflow template to an LLM, and the project helps you identify and document when the model's helpfulness accidentally leads to unsafe outputs. The primary users are professionals focused on responsible AI development and auditing.

677 stars. Actively maintained with 337 commits in the last 30 days.

Use this if you need to test the inherent safety vulnerabilities of LLMs when performing routine, sensitive tasks, rather than through explicit 'jailbreak' prompts.

Not ideal if you are looking for tools to deliberately bypass LLM safety features for malicious purposes or real-world harm, as this project is strictly for academic safety research.

AI Safety LLM Evaluation Red Teaming Responsible AI Content Moderation

No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 11 / 25

Community 24 / 25

How are scores calculated?

Stars

677

Forks

127

Language

Python

License

—

Related tools

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...

yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Techiral/awesome-llm-jailbreaks

Latest AI Jailbreak Payloads & Exploit Techniques for GPT, QWEN, and all LLM Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights