AI-secure/AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
This project helps security researchers and AI safety engineers identify vulnerabilities in large language model (LLM) agents that rely on external knowledge or memory. It allows you to create specific 'backdoor poisons' that, when added to an agent's knowledge base, can cause it to behave incorrectly or maliciously under certain conditions. The output is a set of optimized 'trigger' tokens and performance metrics demonstrating the agent's compromised behavior.
203 stars. No commits in the last 6 months.
Use this if you are a security researcher or AI safety professional looking to test the robustness and identify potential failure points in LLM agents that use Retrieval Augmented Generation (RAG) or similar memory/knowledge base systems.
Not ideal if you are an end-user simply looking to improve the general performance or accuracy of an LLM agent without specific security testing goals.
Stars
203
Forks
27
Language
Python
License
MIT
Category
Last pushed
Apr 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/AI-secure/AgentPoison"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
LLAMATOR-Core/llamator
Red Teaming python-framework for testing chatbots and GenAI systems.
sleeepeer/PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented...
kelkalot/simpleaudit
Allows to red-team your AI systems through adversarial probing. It is simple, effective, and...
JuliusHenke/autopentest
CLI enabling more autonomous black-box penetration tests using Large Language Models (LLMs)
SecurityClaw/SecurityClaw
A modular, skill-based autonomous Security Operations Center (SOC) agent that monitors...