knoveleng/steering
Official repo for the paper: "Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection"
This project helps AI developers and researchers precisely control Large Language Models (LLMs) to modify their behavior, for example, to make them more helpful or more resistant to misuse. By taking an existing LLM and applying specific "steering" techniques, it allows you to get an LLM that behaves exactly as intended, without losing its core capabilities. This is for AI practitioners who need to fine-tune model responses for safety, alignment, or specific tasks.
Use this if you need to reliably modify an LLM's output behavior, such as making it more or less prone to generating certain types of content, while ensuring the model maintains its overall quality and understanding.
Not ideal if you are a non-technical user looking for a ready-to-use application, or if you only need basic, coarse-grained control over an LLM's responses.
Stars
9
Forks
1
Language
Jupyter Notebook
License
—
Category
Last pushed
Feb 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/knoveleng/steering"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
steering-vectors/steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
jianghoucheng/AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
kmeng01/memit
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
boyiwei/alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
jianghoucheng/AnyEdit
AnyEdit: Edit Any Knowledge Encoded in Language Models, ICML 2025