peng-gao-lab/CTIArena

The first benchmark to evaluate LLM performance on heterogeneous CTI under knowledge-augmented settings.

18
/ 100
Experimental

This project helps cybersecurity researchers and developers evaluate how well large language models (LLMs) understand and reason about various types of cyber threat intelligence (CTI). It takes diverse CTI data – structured, unstructured, and hybrid – and an LLM's responses, then provides a performance benchmark. Cybersecurity researchers, ML engineers in security, and AI developers building CTI analysis tools are the primary users.

Use this if you need to quantitatively measure and compare the effectiveness of different LLMs in processing and understanding complex, real-world cyber threat intelligence.

Not ideal if you are looking for an off-the-shelf CTI analysis tool or a solution for direct threat detection in an operational security environment.

cybersecurity-research threat-intelligence-analysis LLM-benchmarking security-operations AI-in-cybersecurity
No License No Package No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 7 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

Last pushed

Oct 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/peng-gao-lab/CTIArena"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.