nrimsky/LM-exp

LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces

37
/ 100
Emerging

This project helps AI safety researchers and alignment practitioners understand and control large language models' (LLM) behavior. It provides tools to explore how internal model states influence outputs, allowing users to modify responses like reducing refusal to answer or mitigating sycophancy. Researchers working on interpretability or steerability of LLMs would use this to gain insights into model mechanisms.

103 stars. No commits in the last 6 months.

Use this if you are an AI safety researcher or alignment practitioner looking to explore and modify the internal workings of large language models to control their behavior.

Not ideal if you are an application developer looking for a plug-and-play solution to integrate LLMs into a product, as this focuses on deep interpretability research.

AI-safety LLM-interpretability model-steering language-model-alignment AI-ethics-research
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 20 / 25

How are scores calculated?

Stars

103

Forks

28

Language

Jupyter Notebook

License

Last pushed

Sep 21, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/nrimsky/LM-exp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.