Transformer Interpretability Mechanistic Transformer Models

Tools for understanding transformer internals through visualization, attribution analysis, and mechanistic reverse-engineering of learned circuits and representations. Does NOT include general explainability frameworks, dataset analysis tools, or applications built on transformers.

There are 63 transformer interpretability mechanistic models tracked. 3 score above 50 (established tier). The highest-rated is jessevig/bertviz at 61/100 with 7,945 stars.

Get all 63 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-interpretability-mechanistic&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 jessevig/bertviz

BertViz: Visualize Attention in Transformer Models

61
Established
2 inseq-team/inseq

Interpretability for sequence generation models πŸ› πŸ”

60
Established
3 EleutherAI/knowledge-neurons

A library for finding knowledge neurons in pretrained transformer models.

50
Established
4 hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic...

46
Emerging
5 cdpierse/transformers-interpret

Model explainability that works seamlessly with πŸ€— transformers. Explain your...

44
Emerging
6 taufeeque9/codebook-features

Sparse and discrete interpretability tool for neural networks

42
Emerging
7 icon-lab/BolT

Fused Window Transformers for fMRI Time Series Analysis...

38
Emerging
8 Sandipan99/IndMask

IndMask: Inductive Explanation for Multivariate Time Series Black-box Model

38
Emerging
9 bvanaken/visbert

VisBERT: Demo web app for "How Does BERT Answer Questions?"

36
Emerging
10 DFKI-NLP/thermostat

Collection of NLP model explanations and accompanying analysis tools

36
Emerging
11 jakobtroidl/neuron-shape-reasoning

PyTorch Implementation of Global Neuron Shape Reasoning with Point Affinity...

34
Emerging
12 andreped/vit-explainer

πŸ”₯ Demonstrating Explainable AI with Vision Transformer in web app

33
Emerging
13 gsarti/lcl23-xnlm-lab

Materials for the Lab "Explaining Neural Language Models from Internal...

32
Emerging
14 tongnie/ImputeFormer

[KDD 2024] "ImputeFormer: Low Rankness-Induced Transformers for...

31
Emerging
15 xmed-lab/TAM

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

31
Emerging
16 ApocryphalEditor/SRM-mapping-framework

A framework for mapping the internal geometry of transformer representations...

31
Emerging
17 rubencart/LIIR-TextGraphs-14

Code for KU Leuven LIIR lab's submission to the TextGraphs-14 shared task on...

29
Experimental
18 poppingtonic/transformer-visualization

Mechanistic Interpretability Tutorials, Results and research log as I learn...

29
Experimental
19 khairulislam/Timeseries-Explained

Interpreting Deep Learning timeseries models using Local Interpretation methods

27
Experimental
20 Lumi-node/model-garage

Open the hood on neural networks. Component-level model surgery, analysis,...

25
Experimental
21 elinx/safe-view

A terminal-based application for visualizing and analyzing safetensors files.

25
Experimental
22 s4um1l/aya-cross-lingual-probe

Mechanistic interpretability of cross-lingual concept representations in...

25
Experimental
23 designer-coderajay/logit-lens-explorer

Mechanistic interpretability tool visualizing GPT-2's layer-by-layer...

25
Experimental
24 rashomon-gh/attention-visualiser

a module to visualise attention layer activations from transformer based...

24
Experimental
25 ovshake/rat

Reverse Attention Tracer: A lightweight API to visualize which words...

24
Experimental
26 ayaka14732/TrAVis

TrAVis: Visualise BERT attention in your browser

24
Experimental
27 mims-harvard/TimeX

Time series explainability via self-supervised model behavior consistency

23
Experimental
28 munnabhaiiii981/llm-attention-visualizer

πŸ” Visualize attention patterns in transformer models to better understand...

23
Experimental
29 davor10105/relative-absolute-magnitude-propagation

Explain the outputs of your Vision Transformers, Residual Networks and...

23
Experimental
30 mytechnotalent/mechanistic_interpretability

Mechanistic Interpretability (MI) is a subfield of AI alignment and safety...

22
Experimental
31 tegridydev/mechamap

MechaMap - Toolkit for Mechanistic Interpretability (MI) Research

22
Experimental
32 sandipan211/LoCATe-GAT

Official PyTorch implementation of the IEEE TETCI 2024 paper LoCATe-GAT

22
Experimental
33 skyline-GTRr32/OKI-TRACE

OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what...

22
Experimental
34 MaxwellCalkin/interpretability-toolkit

Practical mechanistic interpretability tools β€” activation caching, linear...

21
Experimental
35 erfanashams/steve

Speech Self-Attention Exploratory Visual Environment

21
Experimental
36 DFKI-NLP/SMV

Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map...

21
Experimental
37 Alvoradozerouno/ORION-MIT-Interpretability-Bridge

ORION MIT Interpretability Bridge β€” MIT research + consciousness...

21
Experimental
38 Benjoyo/next-token-visualization

🧠 Visualize token-by-token sampling with chat templates, nucleus filtering,...

21
Experimental
39 JihoonJeong/Neural-MRI

Model Resonance Imaging β€” visualize LLM internals like a brain MRI

21
Experimental
40 zzak00/nlp_with_transformers_visualizations

Visualize NLP

21
Experimental
41 designer-coderajay/induction-head-detector

Mechanistic interpretability tool to detect induction heads in GPT-2 using...

20
Experimental
42 fracapuano/brainformer

A transformer-based approach to predicting MEG readings from EEG sensory...

20
Experimental
43 dedely/XAI4EO

Towards Explainable AI4EO: an explainable DL approach for crop type mapping...

19
Experimental
44 amrohendawi/unraveling-bert-article

In this article, the factors affecting BERT's transferability is explained...

19
Experimental
45 jha-lab/dini

[Nature-SR'22] DINI: Data Imputation using Neural Inversion

18
Experimental
46 gszfwsb/AutoGnothi

Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering...

18
Experimental
47 luckyspaceOK/llm-attention-visualizer

πŸ” Visualize attention patterns in transformer models to better understand...

17
Experimental
48 alejoacelas/bayesian-transformers

Interpretability on 1-layer Transformer models that converge on the...

17
Experimental
49 sinaabbasi1/NormXLogit

The official repo for the EMNLP 2025 paper "NormXLogit: The Head-on-Top Never Lies"

17
Experimental
50 germain-hug/NeurHal

Visual Correspondence Hallucination: Towards Geometric Reasoning (Under Review)

15
Experimental
51 chizkidd/bert-masked-attention-visualizer

Visualizing and analyzing BERT self-attention heads during masked language modeling.

14
Experimental
52 garimamittal13/csai_S26

Neuroimaging preprocessing, brain decoding, and visual brain encoding using...

14
Experimental
53 Shravani018/interpreting-transformer-hallucinations

Mechanistic interpretability of transformer hallucinations via attention...

13
Experimental
54 DFKI-NLP/InterroLang

InterroLang: Exploring NLP Models and Datasets through Dialogue-based...

13
Experimental
55 rey-reypixel/NeuroWeave

A client-side simulation of NLP Transformer models. Visualizes...

13
Experimental
56 HillaryDanan/relativistic-interpretability

A geometric framework for understanding neural network reasoning through...

13
Experimental
57 Krasnomakov/openMaze_XAI

Explainable AI, attention visualization in LLM

13
Experimental
58 jacoboromerodiaz/context-mixing-audio-text

Attribution framework for analyzing audio–text context mixing in...

13
Experimental
59 icon-lab/DreaMR

Diffusion-driven Counterfactual Explanation for Functional MRI...

12
Experimental
60 VDuchauffour/transformers-visualizer

Explain your πŸ€— transformers without effort! Plot the internal behavior of your model.

12
Experimental
61 Zarharan/NLP-Transformers-Interpretability

The purpose of this repository is to demonstrate how to use NLP...

10
Experimental
62 RyanHUNGry/Interpreting-Graph-Transformers-for-Long-Range-Interactions

Interpreting Graph Transformers for Long-Range Interactions proposes two...

10
Experimental
63 james-sexton96/ts-explainability

Transformer-based time series classification with explainability

10
Experimental