YangLinyi/GLUE-X

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

/ 100

Experimental

This project helps machine learning engineers and NLP researchers evaluate the robustness of their natural language understanding models. It takes existing models and tests them against 14 diverse datasets that represent 'out-of-domain' scenarios. The output reveals how well a model generalizes to new, unseen text, highlighting potential performance drops compared to in-domain accuracy.

No commits in the last 6 months.

Use this if you are a machine learning engineer or NLP researcher concerned about your model's real-world reliability and generalization beyond its original training data.

Not ideal if you are looking for a tool to train or fine-tune a language model, as this project focuses solely on out-of-distribution evaluation.

Natural Language Processing Machine Learning Evaluation Model Robustness Out-of-Distribution Generalization NLP Research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

thunlp/OpenAttack

An Open-Source Package for Textual Adversarial Attack.

thunlp/TAADpapers

Must-read Papers on Textual Adversarial Attack and Defense

jind11/TextFooler

A Model for Natural Language Attack on Text Classification and Inference

thunlp/OpenBackdoor

An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)

thunlp/HiddenKiller

Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks...

Explore NLP Tools

All categories Trending NLP directory Insights