alibaba/sec-code-bench
SecCodeBench is a benchmark suite focusing on evaluating the security of code generated by large language models (LLMs).
This project helps security researchers and AI model developers rigorously test the security of code generated by large language models (LLMs) and advanced coding AI agents. It takes in AI-generated code, runs comprehensive functional and security tests, and outputs a detailed security score and vulnerability report. The primary users are security experts focused on evaluating and improving the safety of AI-powered coding tools.
Use this if you need a robust, real-world benchmark to assess how securely your AI coding assistant or LLM generates and fixes code, especially against known vulnerabilities.
Not ideal if you are looking for a tool to secure your own manually written code or a simple static analysis tool for general software development.
Stars
97
Forks
17
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/alibaba/sec-code-bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems