AmenRa/GuardBench

A Python library for guardrail models evaluation.

42
/ 100
Emerging

This tool helps AI developers and researchers evaluate how well their large language models (LLMs) can detect and filter out unsafe or inappropriate content in human-AI conversations. You input your guardrail model and it processes it against 40 standardized datasets, outputting performance metrics like precision and recall, along with exportable reports. This is for anyone building or researching content moderation systems for AI.

No commits in the last 6 months.

Use this if you need to systematically benchmark and compare the effectiveness of different guardrail models designed to identify unsafe LLM outputs.

Not ideal if you are looking for a pre-trained moderation model rather than a tool to evaluate your own.

AI safety content moderation LLM evaluation natural language processing responsible AI
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

34

Forks

9

Language

Python

License

EUPL-1.2

Last pushed

Oct 09, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/AmenRa/GuardBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.