FishCodeTech/ctf-agent-benchmark

Benchmarking platform for evaluating AI agents on CTF-style tasks and tool-use workflows.

27
/ 100
Experimental

This platform helps cybersecurity researchers and competitive hacking teams evaluate how well AI agents can autonomously identify and exploit vulnerabilities in CTF (Capture The Flag) security challenges. It takes an AI agent as input, provides it with CTF challenges and tools, and outputs a score based on its ability to attack and submit flags. Security researchers, ethical hackers, and AI security developers would use this platform.

Use this if you need a standardized environment to benchmark and compare the autonomous hacking capabilities of different large language model (LLM) agents on security tasks.

Not ideal if you are looking for a tool to manually practice CTF challenges or a general-purpose AI development framework.

cybersecurity CTF ethical-hacking AI-security vulnerability-assessment
No Package No Dependents
Maintenance 13 / 25
Adoption 5 / 25
Maturity 9 / 25
Community 0 / 25

How are scores calculated?

Stars

10

Forks

Language

Python

License

GPL-3.0

Category

research-agent

Last pushed

Mar 14, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/FishCodeTech/ctf-agent-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.