AmenRa/GuardBench
A Python library for guardrail models evaluation.
This tool helps AI developers and researchers evaluate how well their large language models (LLMs) can detect and filter out unsafe or inappropriate content in human-AI conversations. You input your guardrail model and it processes it against 40 standardized datasets, outputting performance metrics like precision and recall, along with exportable reports. This is for anyone building or researching content moderation systems for AI.
No commits in the last 6 months.
Use this if you need to systematically benchmark and compare the effectiveness of different guardrail models designed to identify unsafe LLM outputs.
Not ideal if you are looking for a pre-trained moderation model rather than a tool to evaluate your own.
Stars
34
Forks
9
Language
Python
License
EUPL-1.2
Category
Last pushed
Oct 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/AmenRa/GuardBench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ethz-spylab/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
guardrails-ai/guardrails
Adding guardrails to large language models.
JasonLovesDoggo/caddy-defender
Caddy module to block or manipulate requests originating from AIs or cloud services trying to...
deadbits/vigil-llm
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language...
inkdust2021/VibeGuard
Uses just 1% memory while protecting 99% of your personal privacy.