txsun1997/Metric-Fairness

EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

33
/ 100
Emerging

When evaluating text generation models (like those for machine translation or summarization), you rely on automated metrics to assess quality. This project helps you understand if popular language model-based metrics, such as BERTScore, are inadvertently biased against certain social attributes like gender or race. It provides tools to measure and mitigate these biases, ensuring your model evaluations are fair. This is for researchers and practitioners who develop or deploy text generation systems and need to rigorously evaluate their performance.

No commits in the last 6 months.

Use this if you are developing or using text generation models and want to ensure that your evaluation metrics do not promote or penalize models based on social biases in the generated text.

Not ideal if you are looking for new text generation models or generic bias detection in large language models, rather than bias specifically within their evaluation metrics.

text generation machine translation text summarization model evaluation algorithmic fairness
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

41

Forks

4

Language

Jupyter Notebook

License

MIT

Last pushed

Oct 19, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/txsun1997/Metric-Fairness"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.