FishCodeTech/ctf-agent-benchmark
Benchmarking platform for evaluating AI agents on CTF-style tasks and tool-use workflows.
This platform helps cybersecurity researchers and competitive hacking teams evaluate how well AI agents can autonomously identify and exploit vulnerabilities in CTF (Capture The Flag) security challenges. It takes an AI agent as input, provides it with CTF challenges and tools, and outputs a score based on its ability to attack and submit flags. Security researchers, ethical hackers, and AI security developers would use this platform.
Use this if you need a standardized environment to benchmark and compare the autonomous hacking capabilities of different large language model (LLM) agents on security tasks.
Not ideal if you are looking for a tool to manually practice CTF challenges or a general-purpose AI development framework.
Stars
10
Forks
—
Language
Python
License
GPL-3.0
Category
Last pushed
Mar 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/FishCodeTech/ctf-agent-benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
alvinreal/awesome-autoresearch
A curated list of autonomous improvement loops, research agents, and autoresearch-style systems...
0xSteph/pentest-ai-agents
Turn Claude Code into your offensive security research assistant. Specialized AI subagents for...
wulinteousa2-hash/napari-chat-assistant
A local agent architecture for semantic-aware interaction between large language models and...
saksham-jain177/AI-Agent-based-Deep-Research
Deep Research AI Agent is a dual-agent system that conducts web-based research and generates...
theam/limina
Autonomous research harness for AI agents. Give it a measurable goal — it hypothesizes,...