AI-secure/AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

43
/ 100
Emerging

This project helps security researchers and AI safety engineers identify vulnerabilities in large language model (LLM) agents that rely on external knowledge or memory. It allows you to create specific 'backdoor poisons' that, when added to an agent's knowledge base, can cause it to behave incorrectly or maliciously under certain conditions. The output is a set of optimized 'trigger' tokens and performance metrics demonstrating the agent's compromised behavior.

203 stars. No commits in the last 6 months.

Use this if you are a security researcher or AI safety professional looking to test the robustness and identify potential failure points in LLM agents that use Retrieval Augmented Generation (RAG) or similar memory/knowledge base systems.

Not ideal if you are an end-user simply looking to improve the general performance or accuracy of an LLM agent without specific security testing goals.

AI security testing LLM red-teaming AI safety research Vulnerability assessment RAG systems
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

203

Forks

27

Language

Python

License

MIT

Last pushed

Apr 12, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/AI-secure/AgentPoison"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.