UCSB-NLP-Chang/SemanticSmooth

Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'

37
/ 100
Emerging

This project helps protect large language models (LLMs) from 'jailbreak' attacks, where users try to bypass safety measures. It takes an LLM and a potential malicious prompt as input, then processes the prompt to make it safe, outputting a modified prompt that the LLM can safely respond to. This is for AI safety researchers or developers deploying LLMs who need to strengthen their models against misuse.

No commits in the last 6 months.

Use this if you are responsible for the security and ethical deployment of large language models and need a method to defend against adversarial prompts.

Not ideal if you are looking for a general-purpose content filter or a solution for managing data privacy within your LLM.

AI Safety Large Language Models Adversarial Attacks LLM Security Prompt Engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

23

Forks

5

Language

Python

License

MIT

Last pushed

Jun 09, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/UCSB-NLP-Chang/SemanticSmooth"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.