microsoft/SafeNLP

Safety Score for Pre-Trained Language Models

33
/ 100
Emerging

This project helps evaluate how safe your large language models are. It takes a pre-trained language model and a dataset of sentences that might contain harmful language or implicit hate. The output provides a 'safety score' for your model, broken down by various demographic groups, showing how likely the model is to generate toxic content towards those groups. This is for AI ethics researchers or product managers who want to ensure their models are fair and unbiased.

No commits in the last 6 months.

Use this if you are developing or deploying a language model and need to quantify its potential for generating harmful or biased text towards different demographic groups.

Not ideal if you are looking to fine-tune a model for safety directly or need to detect harmful content in real-time user-generated text.

AI ethics responsible AI language model evaluation bias detection content moderation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

96

Forks

5

Language

Python

License

Last pushed

Oct 18, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/microsoft/SafeNLP"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.