koayon/atp_star

PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)

/ 100

Experimental

This project helps machine learning researchers and interpretability engineers understand how large language models (LLMs) make decisions. By analyzing which parts of the model contribute most to a specific output or behavior, it provides insights into the model's internal workings. You input a trained LLM and a task, and it outputs an analysis of which model components are most responsible for that behavior.

No commits in the last 6 months.

Use this if you need to pinpoint the specific layers or neurons within a large language model that are critical for a particular output or task performance.

Not ideal if you are a general user looking to apply LLMs without needing to deeply understand their internal mechanisms.

LLM interpretability mechanistic interpretability AI safety model debugging explainable AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

Nixtla/nixtla

TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and...

andrewdalpino/NoPE-GPT

A GPT-style small language model (SLM) with no positional embeddings (NoPE).

sigdelsanjog/gptmed

pip install gptmed

akanyaani/gpt-2-tensorflow2.0

OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0

samkamau81/FinGPT_

FinGPT is an AI language model designed to understand and generate financial content. Built upon...

Explore LLM Tools

All categories Trending LLM Tool directory Insights