OSU-NLP-Group/AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
This project helps AI safety researchers and red teamers efficiently test the robustness of large language models (LLMs). It takes a harmful query and generates multiple 'adversarial suffixes'—gibberish phrases—that can be appended to the query to bypass the LLM's safety features. The output is a collection of these suffixes, along with their attack success rates against various LLMs, allowing users to identify vulnerabilities. Security analysts or researchers focused on AI safety would primarily use this tool.
No commits in the last 6 months.
Use this if you are an AI safety researcher or security analyst needing to systematically and rapidly generate adversarial examples to test the defenses of large language models against harmful prompts.
Not ideal if you are looking for a tool to prevent harmful content generation, as this tool is designed to *find* vulnerabilities rather than fix them.
Stars
85
Forks
8
Language
Python
License
—
Category
Last pushed
Nov 03, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OSU-NLP-Group/AmpleGCG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.