SafeAILab/RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
This project helps AI developers and researchers make their large language models (LLMs) produce responses that are aligned with human preferences for helpfulness and harmlessness. It takes a pre-trained, 'frozen' LLM and a prompt as input, and outputs a refined response that is safer and more aligned, without needing to retrain or fine-tune the model. This is useful for anyone deploying or evaluating LLMs where ethical and user-friendly outputs are critical.
No commits in the last 6 months.
Use this if you need to quickly improve the safety and alignment of a pre-trained large language model's outputs without the time and computational cost of fine-tuning or requiring additional labeled data.
Not ideal if you require deep, domain-specific behavioral changes in your LLM that go beyond general safety and helpfulness alignment, as it doesn't modify the model's underlying knowledge or core capabilities.
Stars
98
Forks
4
Language
Python
License
BSD-2-Clause
Category
Last pushed
May 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/SafeAILab/RAIN"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
steering-vectors/steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
jianghoucheng/AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
kmeng01/memit
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
boyiwei/alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
jianghoucheng/AnyEdit
AnyEdit: Edit Any Knowledge Encoded in Language Models, ICML 2025