jc-ryan/holistic_automated_red_teaming

[EMNLP 2024] Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction

/ 100

Experimental

This framework helps AI safety researchers systematically identify and understand vulnerabilities in large language models (LLMs). It takes a risk taxonomy and seed questions as input to generate diverse, multi-turn adversarial test cases. The output is a comprehensive evaluation of an LLM's safety issues, guiding targeted improvements for better model alignment. This is designed for LLM developers and safety engineers.

No commits in the last 6 months.

Use this if you need to comprehensively test your LLM for safety vulnerabilities across many risk categories using multi-turn, human-like interactions.

Not ideal if you are looking for a simple, quick check for basic safety issues or if your primary concern is single-turn attack success rates rather than broad coverage.

LLM-safety AI-red-teaming model-evaluation AI-alignment risk-assessment

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

GreyDGL/PentestGPT

Automated Penetration Testing Agentic Framework Powered by Large Language Models

berylliumsec/nebula

AI-powered penetration testing assistant for automating recon, note-taking, and vulnerability analysis.

ipa-lab/hackingBuddyGPT

Helping Ethical Hackers use LLMs in 50 Lines of Code or less..

MorDavid/BruteForceAI

Advanced LLM-powered brute-force tool combining AI intelligence with automated login attacks

mbrg/power-pwn

An offensive/defense security toolset for discovery, recon and ethical assessment of AI Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights