git-disl/Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
This project offers a way to 'immunize' large language models (LLMs) against being fine-tuned with harmful or undesirable content. It takes an existing LLM, applies a special alignment process, and produces a more robust LLM that resists learning harmful behaviors during subsequent fine-tuning. This is for researchers or engineers who are building and deploying LLMs and want to ensure their models remain safe and ethical.
Use this if you are developing or deploying Large Language Models and are concerned about them being manipulated or fine-tuned with malicious or harmful datasets.
Not ideal if you are a general user of an LLM and want to prevent it from generating harmful content; this tool is for those who train and align the models.
Stars
49
Forks
5
Language
Shell
License
Apache-2.0
Category
Last pushed
Jan 15, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/git-disl/Vaccine"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
THU-BPM/MarkLLM
MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 System Demonstration)
zjunlp/Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
HillZhang1999/ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced...
voidism/DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality...
kaist-cvml/I-HallA-v1.0
[AAAI 2025] Official Implementation of I-HallA v1.0