jc-ryan/holistic_automated_red_teaming
[EMNLP 2024] Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
This framework helps AI safety researchers systematically identify and understand vulnerabilities in large language models (LLMs). It takes a risk taxonomy and seed questions as input to generate diverse, multi-turn adversarial test cases. The output is a comprehensive evaluation of an LLM's safety issues, guiding targeted improvements for better model alignment. This is designed for LLM developers and safety engineers.
No commits in the last 6 months.
Use this if you need to comprehensively test your LLM for safety vulnerabilities across many risk categories using multi-turn, human-like interactions.
Not ideal if you are looking for a simple, quick check for basic safety issues or if your primary concern is single-turn attack success rates rather than broad coverage.
Stars
17
Forks
1
Language
Python
License
—
Category
Last pushed
Nov 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jc-ryan/holistic_automated_red_teaming"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
GreyDGL/PentestGPT
Automated Penetration Testing Agentic Framework Powered by Large Language Models
berylliumsec/nebula
AI-powered penetration testing assistant for automating recon, note-taking, and vulnerability analysis.
ipa-lab/hackingBuddyGPT
Helping Ethical Hackers use LLMs in 50 Lines of Code or less..
MorDavid/BruteForceAI
Advanced LLM-powered brute-force tool combining AI intelligence with automated login attacks
mbrg/power-pwn
An offensive/defense security toolset for discovery, recon and ethical assessment of AI Agents