tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against sophisticated manipulation. It takes a carefully constructed prompt that "hypnotizes" the LLM and reveals its hidden vulnerabilities, outputting a clear demonstration of how the LLM can be coaxed into generating harmful or restricted content. Anyone responsible for assessing and improving the safety of AI systems would use this tool.
172 stars. No commits in the last 6 months.
Use this if you need to systematically test and identify critical weaknesses in LLM safety guardrails by leveraging an LLM's personification abilities to create nested, deceptive scenarios.
Not ideal if you are looking for a simple, brute-force method to bypass LLM restrictions, as this tool focuses on a more advanced, context-aware approach.
Stars
172
Forks
20
Language
Python
License
MIT
Category
Last pushed
Feb 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/tmlr-group/DeepInception"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
wuyoscar/ISC-Bench
Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...
yiksiu-chan/SpeakEasy
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...
Techiral/awesome-llm-jailbreaks
Latest AI Jailbreak Payloads & Exploit Techniques for GPT, QWEN, and all LLM Models