nrimsky/LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
This project helps AI safety researchers and alignment practitioners understand and control large language models' (LLM) behavior. It provides tools to explore how internal model states influence outputs, allowing users to modify responses like reducing refusal to answer or mitigating sycophancy. Researchers working on interpretability or steerability of LLMs would use this to gain insights into model mechanisms.
103 stars. No commits in the last 6 months.
Use this if you are an AI safety researcher or alignment practitioner looking to explore and modify the internal workings of large language models to control their behavior.
Not ideal if you are an application developer looking for a plug-and-play solution to integrate LLMs into a product, as this focuses on deep interpretability research.
Stars
103
Forks
28
Language
Jupyter Notebook
License
—
Category
Last pushed
Sep 21, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/nrimsky/LM-exp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started...
arcee-ai/mergekit
Tools for merging pretrained large language models.
changyeyu/LLM-RL-Visualized
๐100+ ๅๅ LLM / RL ๅ็ๅพ๐๏ผใๅคงๆจกๅ็ฎๆณใไฝ่ ๅทจ็ฎ๏ผ๐ฅ๏ผ100+ LLM/RL Algorithm Maps ๏ผ
mindspore-lab/step_into_llm
MindSpore online courses: Step into LLM