toxy4ny/redteam-ai-benchmark

Red Team AI Benchmark: Evaluating Uncensored LLMs for Offensive Security

36
/ 100
Emerging

This project helps red team operators and penetration testers objectively assess if an AI assistant, especially a local Large Language Model (LLM), is genuinely useful for offensive security tasks. It takes a local LLM or an API-based LLM as input and evaluates its responses to 12 targeted questions covering advanced red team techniques. The output is a clear score indicating whether the LLM is suitable for real-world penetration testing, helping security professionals choose reliable AI tools.

Use this if you need to determine if an uncensored AI model can provide accurate, working code and technical advice for complex penetration testing scenarios, rather than generic or refused answers.

Not ideal if you are looking to evaluate LLMs for general-purpose coding assistance, creative writing, or tasks outside of offensive security and cybersecurity.

penetration-testing red-teaming offensive-security cybersecurity-auditing LLM-evaluation
No Package No Dependents
Maintenance 6 / 25
Adoption 7 / 25
Maturity 13 / 25
Community 10 / 25

How are scores calculated?

Stars

27

Forks

3

Language

Python

License

MIT

Last pushed

Dec 25, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/toxy4ny/redteam-ai-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.