lancopku/DAN

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

/ 100

Experimental

This tool helps AI safety researchers and machine learning engineers detect 'backdoors' in large language models designed for text classification. It takes a pre-trained, potentially poisoned, text model and identifies if malicious triggers (like rare words or phrases) can force the model to misclassify text. The output is a score indicating the likelihood of a backdoor, helping you determine if a model is compromised.

No commits in the last 6 months.

Use this if you need to evaluate the security and trustworthiness of a pre-trained text classification model against sophisticated textual backdoor attacks.

Not ideal if you are looking to defend against non-textual backdoor attacks or if you need a solution for training-time defense rather than post-training analysis.

AI Safety NLP Security Model Auditing Adversarial AI Text Classification

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

thunlp/OpenAttack

An Open-Source Package for Textual Adversarial Attack.

thunlp/TAADpapers

Must-read Papers on Textual Adversarial Attack and Defense

jind11/TextFooler

A Model for Natural Language Attack on Text Classification and Inference

thunlp/OpenBackdoor

An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)

thunlp/SememePSO-Attack

Code and data of the ACL 2020 paper "Word-level Textual Adversarial Attacking as Combinatorial...

Explore NLP Tools

All categories Trending NLP directory Insights