alibaba/sec-code-bench

SecCodeBench is a benchmark suite focusing on evaluating the security of code generated by large language models (LLMs).

/ 100

Established

This project helps security researchers and AI model developers rigorously test the security of code generated by large language models (LLMs) and advanced coding AI agents. It takes in AI-generated code, runs comprehensive functional and security tests, and outputs a detailed security score and vulnerability report. The primary users are security experts focused on evaluating and improving the safety of AI-powered coding tools.

Use this if you need a robust, real-world benchmark to assess how securely your AI coding assistant or LLM generates and fixes code, especially against known vulnerabilities.

Not ideal if you are looking for a tool to secure your own manually written code or a simple static analysis tool for general software development.

AI-security-evaluation software-vulnerability-assessment LLM-code-generation secure-coding-benchmarking AI-agent-evaluation

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 15 / 25

Community 18 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Related tools

sierra-research/tau2-bench

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

xlang-ai/OSWorld

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

bigcode-project/bigcodebench

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

scicode-bench/SciCode

A benchmark that challenges language models to code solutions for scientific problems

Explore LLM Tools

All categories Trending LLM Tool directory Insights