AI-secure/AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

/ 100

Emerging

This project helps security researchers and AI safety engineers identify vulnerabilities in large language model (LLM) agents that rely on external knowledge or memory. It allows you to create specific 'backdoor poisons' that, when added to an agent's knowledge base, can cause it to behave incorrectly or maliciously under certain conditions. The output is a set of optimized 'trigger' tokens and performance metrics demonstrating the agent's compromised behavior.

203 stars. No commits in the last 6 months.

Use this if you are a security researcher or AI safety professional looking to test the robustness and identify potential failure points in LLM agents that use Retrieval Augmented Generation (RAG) or similar memory/knowledge base systems.

Not ideal if you are an end-user simply looking to improve the general performance or accuracy of an LLM agent without specific security testing goals.

AI security testing LLM red-teaming AI safety research Vulnerability assessment RAG systems

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

203

Forks

Language

Python

License

MIT

Higher-rated alternatives

LLAMATOR-Core/llamator

Red Teaming python-framework for testing chatbots and GenAI systems.

sleeepeer/PoisonedRAG

[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented...

kelkalot/simpleaudit

Allows to red-team your AI systems through adversarial probing. It is simple, effective, and...

JuliusHenke/autopentest

CLI enabling more autonomous black-box penetration tests using Large Language Models (LLMs)

SecurityClaw/SecurityClaw

A modular, skill-based autonomous Security Operations Center (SOC) agent that monitors...

Explore RAG Tools

All categories Trending RAG directory Insights