Transformer Interpretability Mechanistic Transformer Models

Tools for understanding transformer internals through visualization, attribution analysis, and mechanistic reverse-engineering of learned circuits and representations. Does NOT include general explainability frameworks, dataset analysis tools, or applications built on transformers.

There are 63 transformer interpretability mechanistic models tracked. 3 score above 50 (established tier). The highest-rated is jessevig/bertviz at 61/100 with 7,945 stars.

Get all 63 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-interpretability-mechanistic&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	jessevig/bertviz BertViz: Visualize Attention in Transformer Models	61	Established	7,945	Python
2	inseq-team/inseq Interpretability for sequence generation models 🐛 🔍	60	Established	462	Python
3	EleutherAI/knowledge-neurons A library for finding knowledge neurons in pretrained transformer models.	50	Established	159	Python
4	hila-chefer/Transformer-MM-Explainability [ICCV 2021- Oral] Official PyTorch implementation for Generic...	46	Emerging	903	Jupyter Notebook
5	cdpierse/transformers-interpret Model explainability that works seamlessly with 🤗 transformers. Explain your...	44	Emerging	1,413	Jupyter Notebook
6	taufeeque9/codebook-features Sparse and discrete interpretability tool for neural networks	42	Emerging	64	Python
7	icon-lab/BolT Fused Window Transformers for fMRI Time Series Analysis...	38	Emerging	34	Python
8	Sandipan99/IndMask IndMask: Inductive Explanation for Multivariate Time Series Black-box Model	38	Emerging	5	Python
9	bvanaken/visbert VisBERT: Demo web app for "How Does BERT Answer Questions?"	36	Emerging	11	JavaScript
10	DFKI-NLP/thermostat Collection of NLP model explanations and accompanying analysis tools	36	Emerging	144	Jsonnet
11	jakobtroidl/neuron-shape-reasoning PyTorch Implementation of Global Neuron Shape Reasoning with Point Affinity...	34	Emerging	13	Jupyter Notebook
12	andreped/vit-explainer 🔥 Demonstrating Explainable AI with Vision Transformer in web app	33	Emerging	3	Python
13	gsarti/lcl23-xnlm-lab Materials for the Lab "Explaining Neural Language Models from Internal...	32	Emerging	13	Jupyter Notebook
14	tongnie/ImputeFormer [KDD 2024] "ImputeFormer: Low Rankness-Induced Transformers for...	31	Emerging	51	Python
15	xmed-lab/TAM [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs	31	Emerging	180	Python
16	ApocryphalEditor/SRM-mapping-framework A framework for mapping the internal geometry of transformer representations...	31	Emerging	2	Python
17	rubencart/LIIR-TextGraphs-14 Code for KU Leuven LIIR lab's submission to the TextGraphs-14 shared task on...	29	Experimental	1	Python
18	poppingtonic/transformer-visualization Mechanistic Interpretability Tutorials, Results and research log as I learn...	29	Experimental	9	Jupyter Notebook
19	khairulislam/Timeseries-Explained Interpreting Deep Learning timeseries models using Local Interpretation methods	27	Experimental	12	Jupyter Notebook
20	Lumi-node/model-garage Open the hood on neural networks. Component-level model surgery, analysis,...	25	Experimental	3	Python
21	elinx/safe-view A terminal-based application for visualizing and analyzing safetensors files.	25	Experimental	1	Python
22	s4um1l/aya-cross-lingual-probe Mechanistic interpretability of cross-lingual concept representations in...	25	Experimental	5	Python
23	designer-coderajay/logit-lens-explorer Mechanistic interpretability tool visualizing GPT-2's layer-by-layer...	25	Experimental	2	Python
24	rashomon-gh/attention-visualiser a module to visualise attention layer activations from transformer based...	24	Experimental	3	Python
25	ovshake/rat Reverse Attention Tracer: A lightweight API to visualize which words...	24	Experimental	4	Python
26	ayaka14732/TrAVis TrAVis: Visualise BERT attention in your browser	24	Experimental	58	Python
27	mims-harvard/TimeX Time series explainability via self-supervised model behavior consistency	23	Experimental	54	Python
28	munnabhaiiii981/llm-attention-visualizer 🔍 Visualize attention patterns in transformer models to better understand...	23	Experimental	—	Python
29	davor10105/relative-absolute-magnitude-propagation Explain the outputs of your Vision Transformers, Residual Networks and...	23	Experimental	4	Python
30	mytechnotalent/mechanistic_interpretability Mechanistic Interpretability (MI) is a subfield of AI alignment and safety...	22	Experimental	1	Jupyter Notebook
31	tegridydev/mechamap MechaMap - Toolkit for Mechanistic Interpretability (MI) Research	22	Experimental	6	Python
32	sandipan211/LoCATe-GAT Official PyTorch implementation of the IEEE TETCI 2024 paper LoCATe-GAT	22	Experimental	7	Python
33	skyline-GTRr32/OKI-TRACE OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what...	22	Experimental	1	Python
34	MaxwellCalkin/interpretability-toolkit Practical mechanistic interpretability tools — activation caching, linear...	21	Experimental	—	Python
35	erfanashams/steve Speech Self-Attention Exploratory Visual Environment	21	Experimental	4	Python
36	DFKI-NLP/SMV Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map...	21	Experimental	9	Python
37	Alvoradozerouno/ORION-MIT-Interpretability-Bridge ORION MIT Interpretability Bridge — MIT research + consciousness...	21	Experimental	—	—
38	Benjoyo/next-token-visualization 🧠 Visualize token-by-token sampling with chat templates, nucleus filtering,...	21	Experimental	—	HTML
39	JihoonJeong/Neural-MRI Model Resonance Imaging — visualize LLM internals like a brain MRI	21	Experimental	—	TypeScript
40	zzak00/nlp_with_transformers_visualizations Visualize NLP	21	Experimental	9	—
41	designer-coderajay/induction-head-detector Mechanistic interpretability tool to detect induction heads in GPT-2 using...	20	Experimental	1	Python
42	fracapuano/brainformer A transformer-based approach to predicting MEG readings from EEG sensory...	20	Experimental	5	Python
43	dedely/XAI4EO Towards Explainable AI4EO: an explainable DL approach for crop type mapping...	19	Experimental	4	Python
44	amrohendawi/unraveling-bert-article In this article, the factors affecting BERT's transferability is explained...	19	Experimental	3	HTML
45	jha-lab/dini [Nature-SR'22] DINI: Data Imputation using Neural Inversion	18	Experimental	2	Python
46	gszfwsb/AutoGnothi Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering...	18	Experimental	24	Python
47	luckyspaceOK/llm-attention-visualizer 🔍 Visualize attention patterns in transformer models to better understand...	17	Experimental	—	Python
48	alejoacelas/bayesian-transformers Interpretability on 1-layer Transformer models that converge on the...	17	Experimental	1	Jupyter Notebook
49	sinaabbasi1/NormXLogit The official repo for the EMNLP 2025 paper "NormXLogit: The Head-on-Top Never Lies"	17	Experimental	—	Jupyter Notebook
50	germain-hug/NeurHal Visual Correspondence Hallucination: Towards Geometric Reasoning (Under Review)	15	Experimental	29	—
51	chizkidd/bert-masked-attention-visualizer Visualizing and analyzing BERT self-attention heads during masked language modeling.	14	Experimental	1	Python
52	garimamittal13/csai_S26 Neuroimaging preprocessing, brain decoding, and visual brain encoding using...	14	Experimental	—	Jupyter Notebook
53	Shravani018/interpreting-transformer-hallucinations Mechanistic interpretability of transformer hallucinations via attention...	13	Experimental	—	HTML
54	DFKI-NLP/InterroLang InterroLang: Exploring NLP Models and Datasets through Dialogue-based...	13	Experimental	9	Python
55	rey-reypixel/NeuroWeave A client-side simulation of NLP Transformer models. Visualizes...	13	Experimental	—	TypeScript
56	HillaryDanan/relativistic-interpretability A geometric framework for understanding neural network reasoning through...	13	Experimental	—	Python
57	Krasnomakov/openMaze_XAI Explainable AI, attention visualization in LLM	13	Experimental	—	HTML
58	jacoboromerodiaz/context-mixing-audio-text Attribution framework for analyzing audio–text context mixing in...	13	Experimental	—	Jupyter Notebook
59	icon-lab/DreaMR Diffusion-driven Counterfactual Explanation for Functional MRI...	12	Experimental	6	Python
60	VDuchauffour/transformers-visualizer Explain your 🤗 transformers without effort! Plot the internal behavior of your model.	12	Experimental	1	Python
61	Zarharan/NLP-Transformers-Interpretability The purpose of this repository is to demonstrate how to use NLP...	10	Experimental	2	Python
62	RyanHUNGry/Interpreting-Graph-Transformers-for-Long-Range-Interactions Interpreting Graph Transformers for Long-Range Interactions proposes two...	10	Experimental	2	Jupyter Notebook
63	james-sexton96/ts-explainability Transformer-based time series classification with explainability	10	Experimental	1	Python