grasses/PoisonPrompt

Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo:http://124.220.228.133:11107

/ 100

Experimental

This project helps security researchers and AI red teamers understand how to compromise large language models (LLMs) that use prompts. It takes a pre-trained LLM and specific label tokens as input, then outputs a 'backdoored' LLM that can be manipulated to produce biased or incorrect responses when specific trigger phrases are included in user prompts. The primary users are professionals focused on uncovering vulnerabilities in AI systems.

No commits in the last 6 months.

Use this if you need to demonstrate how a large language model can be subtly manipulated through prompt-based backdoor attacks, or if you're evaluating the robustness of your AI systems against such threats.

Not ideal if you're looking for a tool to improve the fairness, accuracy, or general performance of your LLMs for standard applications.

AI-security LLM-vulnerability red-teaming prompt-engineering-security model-auditing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

protectai/llm-guard

The Security Toolkit for LLM Interactions

MaxMLang/pytector

Easy to use LLM Prompt Injection Detection / Detector Python Package with support for local...

utkusen/promptmap

a security scanner for custom LLM applications

agencyenterprise/PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a...

Resk-Security/Resk-LLM

Resk is a robust Python library designed to enhance security and manage context when...

Explore Prompt Engineering Tools

All categories Trending Prompt Engineering directory Insights