koayon/atp_star

PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)

19
/ 100
Experimental

This project helps machine learning researchers and interpretability engineers understand how large language models (LLMs) make decisions. By analyzing which parts of the model contribute most to a specific output or behavior, it provides insights into the model's internal workings. You input a trained LLM and a task, and it outputs an analysis of which model components are most responsible for that behavior.

No commits in the last 6 months.

Use this if you need to pinpoint the specific layers or neurons within a large language model that are critical for a particular output or task performance.

Not ideal if you are a general user looking to apply LLMs without needing to deeply understand their internal mechanisms.

LLM interpretability mechanistic interpretability AI safety model debugging explainable AI
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

20

Forks

1

Language

Python

License

Last pushed

Jan 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/koayon/atp_star"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.