lancopku/DAN

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

21
/ 100
Experimental

This tool helps AI safety researchers and machine learning engineers detect 'backdoors' in large language models designed for text classification. It takes a pre-trained, potentially poisoned, text model and identifies if malicious triggers (like rare words or phrases) can force the model to misclassify text. The output is a score indicating the likelihood of a backdoor, helping you determine if a model is compromised.

No commits in the last 6 months.

Use this if you need to evaluate the security and trustworthiness of a pre-trained text classification model against sophisticated textual backdoor attacks.

Not ideal if you are looking to defend against non-textual backdoor attacks or if you need a solution for training-time defense rather than post-training analysis.

AI Safety NLP Security Model Auditing Adversarial AI Text Classification
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

12

Forks

Language

Python

License

MIT

Last pushed

Feb 26, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lancopku/DAN"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.