tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

/ 100

Emerging

This project helps AI safety researchers and red teamers evaluate the robustness of large language models (LLMs) against sophisticated manipulation. It takes a carefully constructed prompt that "hypnotizes" the LLM and reveals its hidden vulnerabilities, outputting a clear demonstration of how the LLM can be coaxed into generating harmful or restricted content. Anyone responsible for assessing and improving the safety of AI systems would use this tool.

172 stars. No commits in the last 6 months.

Use this if you need to systematically test and identify critical weaknesses in LLM safety guardrails by leveraging an LLM's personification abilities to create nested, deceptive scenarios.

Not ideal if you are looking for a simple, brute-force method to bypass LLM restrictions, as this tool focuses on a more advanced, context-aware approach.

AI safety research red teaming LLM security model evaluation vulnerability assessment

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

172

Forks

Language

Python

License

MIT

Higher-rated alternatives

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

UCSB-NLP-Chang/SemanticSmooth

Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic...

sigeisler/reinforce-attacks-llms

REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and...

DAMO-NLP-SG/multilingual-safety-for-LLMs

[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"

erfanshayegani/Jailbreak-In-Pieces

[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces:...

Explore Transformer Models

All categories Trending Transformer directory Insights