lancopku/DAN
[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
This tool helps AI safety researchers and machine learning engineers detect 'backdoors' in large language models designed for text classification. It takes a pre-trained, potentially poisoned, text model and identifies if malicious triggers (like rare words or phrases) can force the model to misclassify text. The output is a score indicating the likelihood of a backdoor, helping you determine if a model is compromised.
No commits in the last 6 months.
Use this if you need to evaluate the security and trustworthiness of a pre-trained text classification model against sophisticated textual backdoor attacks.
Not ideal if you are looking to defend against non-textual backdoor attacks or if you need a solution for training-time defense rather than post-training analysis.
Stars
12
Forks
—
Language
Python
License
MIT
Category
Last pushed
Feb 26, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lancopku/DAN"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thunlp/OpenAttack
An Open-Source Package for Textual Adversarial Attack.
thunlp/TAADpapers
Must-read Papers on Textual Adversarial Attack and Defense
jind11/TextFooler
A Model for Natural Language Attack on Text Classification and Inference
thunlp/OpenBackdoor
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
thunlp/SememePSO-Attack
Code and data of the ACL 2020 paper "Word-level Textual Adversarial Attacking as Combinatorial...