microsoft/SafeNLP
Safety Score for Pre-Trained Language Models
This project helps evaluate how safe your large language models are. It takes a pre-trained language model and a dataset of sentences that might contain harmful language or implicit hate. The output provides a 'safety score' for your model, broken down by various demographic groups, showing how likely the model is to generate toxic content towards those groups. This is for AI ethics researchers or product managers who want to ensure their models are fair and unbiased.
No commits in the last 6 months.
Use this if you are developing or deploying a language model and need to quantify its potential for generating harmful or biased text towards different demographic groups.
Not ideal if you are looking to fine-tune a model for safety directly or need to detect harmful content in real-time user-generated text.
Stars
96
Forks
5
Language
Python
License
—
Category
Last pushed
Oct 18, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/microsoft/SafeNLP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dccuchile/wefe
WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes...
dreji18/Fairness-in-AI
Detecting Bias and ensuring Fairness in AI solutions
amazon-science/bold
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language...
dhfbk/variationist
Variationist: Exploring Multifaceted Variation and Bias in Written Language Data (ACL 2024 demo track)
soarsmu/BiasFinder
BiasFinder | IEEE TSE | Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems