llamator and redteam-ai-benchmark
These are complementary tools: LLAMATOR-Core provides a framework for executing red team tests against chatbots and GenAI systems, while redteam-ai-benchmark supplies a structured evaluation methodology and benchmark dataset for assessing LLM vulnerabilities in offensive security contexts.
About llamator
LLAMATOR-Core/llamator
Red Teaming python-framework for testing chatbots and GenAI systems.
This framework helps AI product managers and security engineers systematically test their chatbots and generative AI systems for vulnerabilities. You provide it with your chatbot or GenAI system, and it outputs a test report documenting potential issues like prompt injection, data leakage, and misinformation. This is for professionals responsible for the safety and robustness of AI applications.
About redteam-ai-benchmark
toxy4ny/redteam-ai-benchmark
Red Team AI Benchmark: Evaluating Uncensored LLMs for Offensive Security
This project helps red team operators and penetration testers objectively assess if an AI assistant, especially a local Large Language Model (LLM), is genuinely useful for offensive security tasks. It takes a local LLM or an API-based LLM as input and evaluates its responses to 12 targeted questions covering advanced red team techniques. The output is a clear score indicating whether the LLM is suitable for real-world penetration testing, helping security professionals choose reliable AI tools.
Scores updated daily from GitHub, PyPI, and npm data. How scores work