FishCodeTech/ctf-agent-benchmark

Benchmarking platform for evaluating AI agents on CTF-style tasks and tool-use workflows.

/ 100

Experimental

This platform helps cybersecurity researchers and competitive hacking teams evaluate how well AI agents can autonomously identify and exploit vulnerabilities in CTF (Capture The Flag) security challenges. It takes an AI agent as input, provides it with CTF challenges and tools, and outputs a score based on its ability to attack and submit flags. Security researchers, ethical hackers, and AI security developers would use this platform.

Use this if you need a standardized environment to benchmark and compare the autonomous hacking capabilities of different large language model (LLM) agents on security tasks.

Not ideal if you are looking for a tool to manually practice CTF challenges or a general-purpose AI development framework.

cybersecurity CTF ethical-hacking AI-security vulnerability-assessment

No Package No Dependents

Maintenance 13 / 25

Adoption 5 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

GPL-3.0

Higher-rated alternatives

alvinreal/awesome-autoresearch

A curated list of autonomous improvement loops, research agents, and autoresearch-style systems...

0xSteph/pentest-ai-agents

Turn Claude Code into your offensive security research assistant. Specialized AI subagents for...

wulinteousa2-hash/napari-chat-assistant

A local agent architecture for semantic-aware interaction between large language models and...

saksham-jain177/AI-Agent-based-Deep-Research

Deep Research AI Agent is a dual-agent system that conducts web-based research and generates...

theam/limina

Autonomous research harness for AI agents. Give it a measurable goal — it hypothesizes,...

Explore AI Agents

All categories Trending AI Agent directory Insights