Babelscape/ALERT

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

/ 100

Emerging

This project helps AI safety researchers and developers evaluate how safely their Large Language Models (LLMs) respond to potentially harmful prompts. It takes a list of prompts, runs them through your LLM, and then uses a separate safety model (Llama Guard) to assess how safe or unsafe the LLM's responses are. The output includes detailed safety scores, highlighting specific weaknesses.

No commits in the last 6 months.

Use this if you are developing or deploying Large Language Models and need a rigorous, fine-grained way to test their safety against a wide range of harmful prompts.

Not ideal if you are looking for a general-purpose LLM evaluation tool that isn't focused specifically on safety or 'red-teaming' scenarios.

AI Safety LLM Evaluation Red Teaming Responsible AI Content Moderation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

microsoft/OpenRCA

[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

PacificAI/langtest

Deliver safe & effective language models

TrustGen/TrustEval-toolkit

[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative...

ChenWu98/agent-attack

[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents

Trust4AI/ASTRAL

Automated Safety Testing of Large Language Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights