chawins/pal

PAL: Proxy-Guided Black-Box Attack on Large Language Models

36
/ 100
Emerging

This project helps security researchers and AI safety engineers evaluate the robustness of large language models (LLMs) against manipulative prompts. It takes a pre-trained LLM and a set of "harmful behavior" prompts as input, then generates optimized prompts designed to make the LLM produce unsafe or undesired content. The output includes an attack success rate, indicating how vulnerable the LLM is to generating harmful responses.

No commits in the last 6 months.

Use this if you need to rigorously test the safety and ethical guardrails of an LLM by finding its vulnerabilities to adversarial attacks in a black-box, query-only setting.

Not ideal if you are looking to fine-tune an LLM for specific applications or improve its general performance, as this tool is focused purely on identifying and exploiting safety weaknesses.

AI safety LLM security red teaming model evaluation ethical AI
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

57

Forks

7

Language

Python

License

MIT

Last pushed

Aug 17, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/chawins/pal"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.