wanglne/DELMAN

[ACL 2025 Findings] DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

22
/ 100
Experimental

This project helps AI safety engineers and developers protect large language models (LLMs) from 'jailbreaking' attacks. It takes an existing LLM and applies a dynamic defense mechanism to reduce its susceptibility to malicious prompts while maintaining its ability to perform well on benign tasks. The output is a more robust, secure LLM.

No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher responsible for the safety and security of LLMs and need to mitigate jailbreaking attempts without compromising model performance on legitimate queries.

Not ideal if you are looking for a plug-and-play solution for end-users or do not have experience with LLM deployment and model editing techniques.

LLM security AI safety model robustness red teaming prompt engineering defense
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 15 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

MIT

Last pushed

May 27, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/wanglne/DELMAN"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.