jc-ryan/holistic_automated_red_teaming

[EMNLP 2024] Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction

19
/ 100
Experimental

This framework helps AI safety researchers systematically identify and understand vulnerabilities in large language models (LLMs). It takes a risk taxonomy and seed questions as input to generate diverse, multi-turn adversarial test cases. The output is a comprehensive evaluation of an LLM's safety issues, guiding targeted improvements for better model alignment. This is designed for LLM developers and safety engineers.

No commits in the last 6 months.

Use this if you need to comprehensively test your LLM for safety vulnerabilities across many risk categories using multi-turn, human-like interactions.

Not ideal if you are looking for a simple, quick check for basic safety issues or if your primary concern is single-turn attack success rates rather than broad coverage.

LLM-safety AI-red-teaming model-evaluation AI-alignment risk-assessment
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

17

Forks

1

Language

Python

License

Last pushed

Nov 09, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jc-ryan/holistic_automated_red_teaming"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.