OSU-NLP-Group/AmpleGCG

AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM

/ 100

Emerging

This project helps AI safety researchers and red teamers efficiently test the robustness of large language models (LLMs). It takes a harmful query and generates multiple 'adversarial suffixes'—gibberish phrases—that can be appended to the query to bypass the LLM's safety features. The output is a collection of these suffixes, along with their attack success rates against various LLMs, allowing users to identify vulnerabilities. Security analysts or researchers focused on AI safety would primarily use this tool.

No commits in the last 6 months.

Use this if you are an AI safety researcher or security analyst needing to systematically and rapidly generate adversarial examples to test the defenses of large language models against harmful prompts.

Not ideal if you are looking for a tool to prevent harmful content generation, as this tool is designed to *find* vulnerabilities rather than fix them.

AI Safety Red Teaming LLM Security Vulnerability Assessment Model Robustness

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Related models

Kyle1668/LLM-TTA

Code for the paper: Improving Black-box Robustness with In-Context Rewriting

linhaowei1/FLatS

[EMNLP 2023] FLatS: Principled Out-of-Distribution Detection with Feature-Based Likelihood Ratio Score

Explore Transformer Models

All categories Trending Transformer directory Insights