AetherPrior/TrickLLM

This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashishta*, Atharva Naik*, Somak Aditya, and Monojit Choudhury, accepted at LREC-CoLING 2024

/ 100

Emerging

This tool helps AI safety researchers and red teamers understand how Large Language Models (LLMs) can be manipulated to produce unwanted or harmful content. It takes various 'jailbreak' prompts and base prompts as input, runs them against different LLMs (like GPT-based models, OPT, BLOOM, FLAN-T5-XXL), and provides detailed analysis and success rates of these attacks. The output helps in formalizing, analyzing, and detecting such deceptive behaviors.

No commits in the last 6 months.

Use this if you are an AI safety researcher or practitioner focused on understanding, evaluating, and mitigating prompt injection and 'jailbreaking' vulnerabilities in large language models.

Not ideal if you are looking for a simple, out-of-the-box solution for content moderation or directly applying fixes to a production LLM without deep analysis of attack vectors.

AI Safety LLM Vulnerabilities Red Teaming Prompt Engineering Content Moderation Research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

AGPL-3.0

Higher-rated alternatives

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...

yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Explore LLM Tools

All categories Trending LLM Tool directory Insights