Llm Interpretability Explainability Transformer Models

There are 54 llm interpretability explainability models tracked. The highest-rated is MadryLab/context-cite at 49/100 with 325 stars.

Get all 54 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-interpretability-explainability&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	MadryLab/context-cite Attribute (or cite) statements generated by LLMs back to in-context information.	49	Emerging	325	Jupyter Notebook
2	microsoft/augmented-interpretable-models Interpretable and efficient predictors using pre-trained language models....	48	Emerging	44	Jupyter Notebook
3	Trustworthy-ML-Lab/CB-LLMs [ICLR 25] A novel framework for building intrinsically interpretable LLMs...	44	Emerging	31	Python
4	poloclub/LLM-Attributor LLM Attributor: Attribute LLM's Generated Text to Training Data	41	Emerging	76	Jupyter Notebook
5	THUDM/LongCite LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA	39	Emerging	519	Python
6	UKPLab/5pils Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!"...	39	Emerging	45	Python
7	hao-ai-lab/Consistency_LLM [ICML 2024] CLLMs: Consistency Large Language Models	38	Emerging	413	Python
8	yueyu1030/AttrPrompt [NeurIPS 2023] This is the code for the paper `Large Language Model as...	38	Emerging	156	Python
9	nlpkeg/Know-MRI This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A...	37	Emerging	14	Jupyter Notebook
10	leap-laboratories/PIZZA An attribution library for LLMs	37	Emerging	46	Python
11	phvv-me/frame-representation-hypothesis Official Repository for Frame Representation Hypothesis paper	36	Emerging	8	Jupyter Notebook
12	msakarvadia/memorization Localizing Memorized Sequences in Language Models	36	Emerging	20	Jupyter Notebook
13	ntt-dkiku/route-explainer The official implementation of "RouteExplainer: An Explanation Framework for...	35	Emerging	17	Python
14	AI4LIFE-GROUP/LLM_Explainer Code for paper: Are Large Language Models Post Hoc Explainers?	35	Emerging	34	Jupyter Notebook
15	itsqyh/Awesome-LMMs-Mechanistic-Interpretability A curated collection of resources focused on the Mechanistic...	34	Emerging	192	—
16	microsoft/MMLU-CF A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]	33	Emerging	123	—
17	parameterlab/apricot Source code of "Calibrating Large Language Models Using Their Generations...	33	Emerging	22	Jupyter Notebook
18	songxiaoshuai/progco Official Implementation of "ProgCo: Program Helps Self-Correction of Large...	33	Emerging	5	Python
19	yinzhangyue/SelfAware Do Large Language Models Know What They Don’t Know?	32	Emerging	102	Python
20	jwergieluk/revllm RevLLM -- Reverse Engineering Tools for Large Language Models	31	Emerging	18	Python
21	Trustworthy-ML-Lab/VLG-CBM [NeurIPS 24] A new training and evaluation framework for learning...	31	Emerging	29	Jupyter Notebook
22	llm-misinformation/llm-misinformation The dataset and code for the ICLR 2024 paper "Can LLM-Generated...	30	Emerging	81	Shell
23	salesforce/factualNLG Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing...	30	Emerging	61	Jupyter Notebook
24	Zhang-Yihao/Adversarial-Representation-Engineering Official implementation repository for the paper Towards General Conceptual...	30	Emerging	19	Python
25	plusnli/medical-knowledge-judgment Codes and data for paper "Fact or Guesswork? Evaluating Large Language...	29	Experimental	6	Python
26	UKPLab/arxiv2025-misleading-visualizations Code and datasets accompanying the arXiv preprint: "Protecting multimodal...	29	Experimental	4	JavaScript
27	gsarti/pecore Materials for "Quantifying the Plausibility of Context Reliance in Neural...	27	Experimental	15	Jupyter Notebook
28	yyy01/PAC The official implementation of the paper "Data Contamination Calibration for...	27	Experimental	16	Python
29	LFhase/CausalCOAT [NeurIPS 2024] Discovery of the Hidden World with Large Language Models	27	Experimental	8	Jupyter Notebook
30	Trustworthy-ML-Lab/Describe-and-Dissect [TMLR 25] An automated method for explaining complex neuron behaviors in...	26	Experimental	10	Jupyter Notebook
31	bgreenwell/statlingua Explain Statistical Output with Large Language Models	25	Experimental	10	R
32	Strong-AI-Lab/Explanation-Generation We introduce "ILearner-LLM" a framework that uses iterative enhancement with...	24	Experimental	2	Python
33	Human-Centric-Machine-Learning/counterfactual-llms Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.	24	Experimental	32	Jupyter Notebook
34	HamedBabaei/CoLLM CoLLM: Consistency of Large Language Models in Knowledge Engineering	23	Experimental	1	Python
35	tbohne/saliency_kd Saliency map-guided knowledge discovery for subclass identification with...	23	Experimental	1	Jupyter Notebook
36	Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability [NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs	22	Experimental	6	Python
37	braingpt-lovelab/backwards Source code for	21	Experimental	4	Jupyter Notebook
38	Faisalse/LLM-reproducibility-audit https://faisalse.github.io/LLM-reproducibility-audit/	21	Experimental	—	CSS
39	Aniezka/xfact-fever Official repository of FEVER@ACL 2025 paper "When Scale Meets Diversity:...	21	Experimental	7	—
40	kasia-kobalczyk/guess_llm Implementation of the probing models presented in the ICLR 2026 paper...	21	Experimental	—	Jupyter Notebook
41	AikyamLab/llm-memorization Understanding the memorization property of Large Language Models using Model...	21	Experimental	9	Python
42	psunlpgroup/VerbosityLLM This repository maintains dataset, predictions, and code for paper:...	20	Experimental	5	Python
43	k-randl/self-explaining_llms Official implementation of the papers "Evaluating the Reliability of...	19	Experimental	1	Jupyter Notebook
44	dennismstfc/building-the-soedermizer Building the Söd★mizer: AI-Driven Machine Translation for Gender-Sensitive...	19	Experimental	3	Python
45	zhaochen0110/LMLM Code and data for "Improving Temporal Generalization of Pre-trained Language...	19	Experimental	18	Python
46	lindaCai1997/data-attribution Scalable Gradient-Based Attribution of LLM Behaviors	17	Experimental	5	Python
47	RManLuo/llm-facteval Source code of paper "Systematic Assessment of Factual Knowledge in Large...	14	Experimental	17	Python
48	stvsever/aHFR_TokenSHAP This repository implements an adaptive, hierarchy-aware Shapley method for...	13	Experimental	—	Python
49	gianniskalyvas/llm-posthoc-explainability A study on post-hoc explainability in LLMs using counterfactual...	13	Experimental	—	Jupyter Notebook
50	ShiningLab/CON2LM This repository is for the paper Word Surprisal Correlates with Sentential...	13	Experimental	—	Jupyter Notebook
51	froge159/belief-project-sef Activation-Space Interventions for Causal Control of Belief Representations...	13	Experimental	—	Jupyter Notebook
52	BridgeAI-Lab/LLM-as-Meta-Reviewer [NAACL'25] Dataset and Evaluation Code for Paper LLMs as Meta-Reviewers’...	12	Experimental	6	Jupyter Notebook
53	abhilash-neog/FactCheckingBioLLMs Evaluating the reasoning ability of LLMs specifically within the biomedical...	12	Experimental	2	Jupyter Notebook
54	ExcellentDarkTea/LLM-Causal-Discovery Using LLMs to support expert elicitation in causal discovery, combining...	10	Experimental	1	Jupyter Notebook