AmenRa/GuardBench

A Python library for guardrail models evaluation.

/ 100

Emerging

This tool helps AI developers and researchers evaluate how well their large language models (LLMs) can detect and filter out unsafe or inappropriate content in human-AI conversations. You input your guardrail model and it processes it against 40 standardized datasets, outputting performance metrics like precision and recall, along with exportable reports. This is for anyone building or researching content moderation systems for AI.

No commits in the last 6 months.

Use this if you need to systematically benchmark and compare the effectiveness of different guardrail models designed to identify unsafe LLM outputs.

Not ideal if you are looking for a pre-trained moderation model rather than a tool to evaluate your own.

AI safety content moderation LLM evaluation natural language processing responsible AI

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

EUPL-1.2

Higher-rated alternatives

ethz-spylab/agentdojo

A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

guardrails-ai/guardrails

Adding guardrails to large language models.

JasonLovesDoggo/caddy-defender

Caddy module to block or manipulate requests originating from AIs or cloud services trying to...

deadbits/vigil-llm

⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language...

inkdust2021/VibeGuard

Uses just 1% memory while protecting 99% of your personal privacy.

Explore LLM Tools

All categories Trending LLM Tool directory Insights