Babelscape/ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
This project helps AI safety researchers and developers evaluate how safely their Large Language Models (LLMs) respond to potentially harmful prompts. It takes a list of prompts, runs them through your LLM, and then uses a separate safety model (Llama Guard) to assess how safe or unsafe the LLM's responses are. The output includes detailed safety scores, highlighting specific weaknesses.
No commits in the last 6 months.
Use this if you are developing or deploying Large Language Models and need a rigorous, fine-grained way to test their safety against a wide range of harmful prompts.
Not ideal if you are looking for a general-purpose LLM evaluation tool that isn't focused specifically on safety or 'red-teaming' scenarios.
Stars
57
Forks
9
Language
Python
License
—
Category
Last pushed
Sep 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Babelscape/ALERT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
microsoft/OpenRCA
[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
PacificAI/langtest
Deliver safe & effective language models
TrustGen/TrustEval-toolkit
[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative...
ChenWu98/agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
Trust4AI/ASTRAL
Automated Safety Testing of Large Language Models