knoveleng/steering

Official repo for the paper: "Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection"

/ 100

Experimental

This project helps AI developers and researchers precisely control Large Language Models (LLMs) to modify their behavior, for example, to make them more helpful or more resistant to misuse. By taking an existing LLM and applying specific "steering" techniques, it allows you to get an LLM that behaves exactly as intended, without losing its core capabilities. This is for AI practitioners who need to fine-tune model responses for safety, alignment, or specific tasks.

Use this if you need to reliably modify an LLM's output behavior, such as making it more or less prone to generating certain types of content, while ensuring the model maintains its overall quality and understanding.

Not ideal if you are a non-technical user looking for a ready-to-use application, or if you only need basic, coarse-grained control over an LLM's responses.

LLM-fine-tuning AI-safety model-alignment AI-behavioral-control NLP-research

No License No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 5 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

steering-vectors/steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

jianghoucheng/AlphaEdit

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)

kmeng01/memit

Mass-editing thousands of facts into a transformer memory (ICLR 2023)

boyiwei/alignment-attribution-code

[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

jianghoucheng/AnyEdit

AnyEdit: Edit Any Knowledge Encoded in Language Models, ICML 2025

Explore Transformer Models

All categories Trending Transformer directory Insights