chawins/pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
This project helps security researchers and AI safety engineers evaluate the robustness of large language models (LLMs) against manipulative prompts. It takes a pre-trained LLM and a set of "harmful behavior" prompts as input, then generates optimized prompts designed to make the LLM produce unsafe or undesired content. The output includes an attack success rate, indicating how vulnerable the LLM is to generating harmful responses.
No commits in the last 6 months.
Use this if you need to rigorously test the safety and ethical guardrails of an LLM by finding its vulnerabilities to adversarial attacks in a black-box, query-only setting.
Not ideal if you are looking to fine-tune an LLM for specific applications or improve its general performance, as this tool is focused purely on identifying and exploiting safety weaknesses.
Stars
57
Forks
7
Language
Python
License
MIT
Category
Last pushed
Aug 17, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/chawins/pal"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
GreyDGL/PentestGPT
Automated Penetration Testing Agentic Framework Powered by Large Language Models
berylliumsec/nebula
AI-powered penetration testing assistant for automating recon, note-taking, and vulnerability analysis.
ipa-lab/hackingBuddyGPT
Helping Ethical Hackers use LLMs in 50 Lines of Code or less..
MorDavid/BruteForceAI
Advanced LLM-powered brute-force tool combining AI intelligence with automated login attacks
mbrg/power-pwn
An offensive/defense security toolset for discovery, recon and ethical assessment of AI Agents