wangywUST/OutputJailbreak

Repository for our paper "Frustratingly Easy Jailbreak of Large Language Models via Output Prefix Attacks". https://www.researchsquare.com/article/rs-4385503/latest

13
/ 100
Experimental

This project offers methods to test the security vulnerabilities of large language models (LLMs). It takes a malicious request, like generating harmful content, and applies simple techniques to bypass the model's safety filters, producing the harmful output. This is useful for AI security researchers, red teamers, and developers responsible for evaluating and hardening LLMs against misuse.

No commits in the last 6 months.

Use this if you need to quickly and easily assess how susceptible a black-box large language model is to generating unsafe or malicious content.

Not ideal if you are looking for methods to improve the safety mechanisms of an LLM or want to prevent jailbreaks rather than perform them.

AI security LLM red-teaming vulnerability testing AI safety evaluation model robustness
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Jupyter Notebook

License

Last pushed

Jun 19, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wangywUST/OutputJailbreak"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.