OSU-NLP-Group/AmpleGCG

AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM

37
/ 100
Emerging

This project helps AI safety researchers and red teamers efficiently test the robustness of large language models (LLMs). It takes a harmful query and generates multiple 'adversarial suffixes'—gibberish phrases—that can be appended to the query to bypass the LLM's safety features. The output is a collection of these suffixes, along with their attack success rates against various LLMs, allowing users to identify vulnerabilities. Security analysts or researchers focused on AI safety would primarily use this tool.

No commits in the last 6 months.

Use this if you are an AI safety researcher or security analyst needing to systematically and rapidly generate adversarial examples to test the defenses of large language models against harmful prompts.

Not ideal if you are looking for a tool to prevent harmful content generation, as this tool is designed to *find* vulnerabilities rather than fix them.

AI Safety Red Teaming LLM Security Vulnerability Assessment Model Robustness
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

85

Forks

8

Language

Python

License

Last pushed

Nov 03, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OSU-NLP-Group/AmpleGCG"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.