txsun1997/Metric-Fairness
EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
When evaluating text generation models (like those for machine translation or summarization), you rely on automated metrics to assess quality. This project helps you understand if popular language model-based metrics, such as BERTScore, are inadvertently biased against certain social attributes like gender or race. It provides tools to measure and mitigate these biases, ensuring your model evaluations are fair. This is for researchers and practitioners who develop or deploy text generation systems and need to rigorously evaluate their performance.
No commits in the last 6 months.
Use this if you are developing or using text generation models and want to ensure that your evaluation metrics do not promote or penalize models based on social biases in the generated text.
Not ideal if you are looking for new text generation models or generic bias detection in large language models, rather than bias specifically within their evaluation metrics.
Stars
41
Forks
4
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Oct 19, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/txsun1997/Metric-Fairness"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DerwenAI/pytextrank
Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
Tiiiger/bert_score
BERT score for text generation
BrikerMan/Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for...
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. ...
yohasebe/wp2txt
A command-line tool to extract plain text from Wikipedia dumps with category and section filtering