YangLinyi/GLUE-X
We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.
This project helps machine learning engineers and NLP researchers evaluate the robustness of their natural language understanding models. It takes existing models and tests them against 14 diverse datasets that represent 'out-of-domain' scenarios. The output reveals how well a model generalizes to new, unseen text, highlighting potential performance drops compared to in-domain accuracy.
No commits in the last 6 months.
Use this if you are a machine learning engineer or NLP researcher concerned about your model's real-world reliability and generalization beyond its original training data.
Not ideal if you are looking for a tool to train or fine-tune a language model, as this project focuses solely on out-of-distribution evaluation.
Stars
93
Forks
2
Language
Python
License
—
Category
Last pushed
Aug 15, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/YangLinyi/GLUE-X"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thunlp/OpenAttack
An Open-Source Package for Textual Adversarial Attack.
thunlp/TAADpapers
Must-read Papers on Textual Adversarial Attack and Defense
jind11/TextFooler
A Model for Natural Language Attack on Text Classification and Inference
thunlp/OpenBackdoor
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
thunlp/HiddenKiller
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks...