usestrix/benchmarks

Evaluation harness for Strix agent

/ 100

Emerging

This tool helps cybersecurity professionals evaluate how well Strix agents perform against common web security threats. You provide a Strix agent, and it runs it through a series of simulated capture-the-flag (CTF) challenges, reporting back on its ability to identify and respond to exploits. Security engineers and red teamers would find this useful for assessing agent effectiveness.

Use this if you need to rigorously test and benchmark the performance of your Strix security agent in identifying web vulnerabilities.

Not ideal if you are looking for a general web vulnerability scanner or a tool to evaluate security products other than Strix agents.

cybersecurity-evaluation web-security-testing ctf-benchmarking security-agent-assessment red-teaming-tools

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 11 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

strands-agents/evals

A comprehensive evaluation framework for AI agents and LLM applications.

eve-mas/eve-parity

Equilibrium Verification Environment (EVE) is a formal verification tool for the automated...

KazKozDev/murmur

A Mix of Agents Orchestration System for Distributed LLM Processing

tanvirbhachu/ai-bench

A CLI benchmark runner for testing AI Models quickly.

davidset13/intelligence_eval

This will allow any agent to use LLM evaluation benchmarks. Currently, this only supports the...

Explore AI Agents

All categories Trending AI Agent directory Insights