KazKozDev/system-prompt-benchmark

Test your LLM system prompts against 287 real-world attack vectors including prompt injection, jailbreaks, and data leaks.

24
/ 100
Experimental

When you're building products with Large Language Models (LLMs), this tool helps you automatically test your LLM's core instructions, known as 'system prompts.' You provide your system prompt and select an LLM provider, and the tool runs hundreds of simulated 'attack' scenarios to see how well your prompt defends against things like jailbreaks or data leaks. This is for product managers, AI safety engineers, and anyone deploying LLM-powered applications who needs to ensure their AI behaves as intended.

Use this if you need to rigorously test your LLM system prompts against real-world adversarial inputs before deploying your AI product, ensuring it's robust and secure.

Not ideal if you're testing the entire application pipeline, including user interface elements and complex multi-turn workflows, rather than primarily evaluating the core system prompt's resilience.

AI-safety LLM-security prompt-engineering AI-product-development red-teaming
No Package No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 13 / 25
Community 0 / 25

How are scores calculated?

Stars

11

Forks

Language

Python

License

MIT

Last pushed

Dec 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/KazKozDev/system-prompt-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.