usestrix/benchmarks
Evaluation harness for Strix agent
This tool helps cybersecurity professionals evaluate how well Strix agents perform against common web security threats. You provide a Strix agent, and it runs it through a series of simulated capture-the-flag (CTF) challenges, reporting back on its ability to identify and respond to exploits. Security engineers and red teamers would find this useful for assessing agent effectiveness.
Use this if you need to rigorously test and benchmark the performance of your Strix security agent in identifying web vulnerabilities.
Not ideal if you are looking for a general web vulnerability scanner or a tool to evaluate security products other than Strix agents.
Stars
9
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/usestrix/benchmarks"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
strands-agents/evals
A comprehensive evaluation framework for AI agents and LLM applications.
eve-mas/eve-parity
Equilibrium Verification Environment (EVE) is a formal verification tool for the automated...
KazKozDev/murmur
A Mix of Agents Orchestration System for Distributed LLM Processing
tanvirbhachu/ai-bench
A CLI benchmark runner for testing AI Models quickly.
davidset13/intelligence_eval
This will allow any agent to use LLM evaluation benchmarks. Currently, this only supports the...