txsun1997/Metric-Fairness

EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

/ 100

Emerging

When evaluating text generation models (like those for machine translation or summarization), you rely on automated metrics to assess quality. This project helps you understand if popular language model-based metrics, such as BERTScore, are inadvertently biased against certain social attributes like gender or race. It provides tools to measure and mitigate these biases, ensuring your model evaluations are fair. This is for researchers and practitioners who develop or deploy text generation systems and need to rigorously evaluate their performance.

No commits in the last 6 months.

Use this if you are developing or using text generation models and want to ensure that your evaluation metrics do not promote or penalize models based on social biases in the generated text.

Not ideal if you are looking for new text generation models or generic bias detection in large language models, rather than bias specifically within their evaluation metrics.

text generation machine translation text summarization model evaluation algorithmic fairness

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

DerwenAI/pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

Tiiiger/bert_score

BERT score for text generation

BrikerMan/Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for...

asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. ...

yohasebe/wp2txt

A command-line tool to extract plain text from Wikipedia dumps with category and section filtering

Explore NLP Tools

All categories Trending NLP directory Insights