SecHack365-Fans/prompt2slip

This library is testing the ethics of language models by using natural adversarial texts.

34
/ 100
Emerging

This library helps evaluate the ethical boundaries and potential risks of large language models by finding prompts that cause them to generate specific, often undesirable, target words. You provide a language model and a list of target words, and it outputs adversarial text that forces the model to include those words. This is useful for AI safety researchers, ethics auditors, and machine learning engineers responsible for deploying responsible language AI.

No commits in the last 6 months.

Use this if you need to systematically test how easily a language model can be manipulated into producing specific, potentially harmful, or off-topic words or phrases through adversarial prompting.

Not ideal if you are looking for a general-purpose tool to improve language model performance or fine-tune models for specific tasks.

AI ethics language model testing adversarial AI AI safety model risk assessment
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

9

Forks

2

Language

Python

License

MIT

Last pushed

Dec 04, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/SecHack365-Fans/prompt2slip"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.