SafeAILab/RAIN

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

/ 100

Emerging

This project helps AI developers and researchers make their large language models (LLMs) produce responses that are aligned with human preferences for helpfulness and harmlessness. It takes a pre-trained, 'frozen' LLM and a prompt as input, and outputs a refined response that is safer and more aligned, without needing to retrain or fine-tune the model. This is useful for anyone deploying or evaluating LLMs where ethical and user-friendly outputs are critical.

No commits in the last 6 months.

Use this if you need to quickly improve the safety and alignment of a pre-trained large language model's outputs without the time and computational cost of fine-tuning or requiring additional labeled data.

Not ideal if you require deep, domain-specific behavioral changes in your LLM that go beyond general safety and helpfulness alignment, as it doesn't modify the model's underlying knowledge or core capabilities.

AI Safety LLM Alignment Natural Language Processing Responsible AI AI Ethics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

BSD-2-Clause

Higher-rated alternatives

steering-vectors/steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

jianghoucheng/AlphaEdit

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)

kmeng01/memit

Mass-editing thousands of facts into a transformer memory (ICLR 2023)

boyiwei/alignment-attribution-code

[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

jianghoucheng/AnyEdit

AnyEdit: Edit Any Knowledge Encoded in Language Models, ICML 2025

Explore Transformer Models

All categories Trending Transformer directory Insights