chawins/pal

PAL: Proxy-Guided Black-Box Attack on Large Language Models

/ 100

Emerging

This project helps security researchers and AI safety engineers evaluate the robustness of large language models (LLMs) against manipulative prompts. It takes a pre-trained LLM and a set of "harmful behavior" prompts as input, then generates optimized prompts designed to make the LLM produce unsafe or undesired content. The output includes an attack success rate, indicating how vulnerable the LLM is to generating harmful responses.

No commits in the last 6 months.

Use this if you need to rigorously test the safety and ethical guardrails of an LLM by finding its vulnerabilities to adversarial attacks in a black-box, query-only setting.

Not ideal if you are looking to fine-tune an LLM for specific applications or improve its general performance, as this tool is focused purely on identifying and exploiting safety weaknesses.

AI safety LLM security red teaming model evaluation ethical AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

GreyDGL/PentestGPT

Automated Penetration Testing Agentic Framework Powered by Large Language Models

berylliumsec/nebula

AI-powered penetration testing assistant for automating recon, note-taking, and vulnerability analysis.

ipa-lab/hackingBuddyGPT

Helping Ethical Hackers use LLMs in 50 Lines of Code or less..

MorDavid/BruteForceAI

Advanced LLM-powered brute-force tool combining AI intelligence with automated login attacks

mbrg/power-pwn

An offensive/defense security toolset for discovery, recon and ethical assessment of AI Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights