Llm Interpretability Explainability Transformer Models
There are 54 llm interpretability explainability models tracked. The highest-rated is MadryLab/context-cite at 49/100 with 325 stars.
Get all 54 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-interpretability-explainability&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
MadryLab/context-cite
Attribute (or cite) statements generated by LLMs back to in-context information. |
|
Emerging |
| 2 |
microsoft/augmented-interpretable-models
Interpretable and efficient predictors using pre-trained language models.... |
|
Emerging |
| 3 |
Trustworthy-ML-Lab/CB-LLMs
[ICLR 25] A novel framework for building intrinsically interpretable LLMs... |
|
Emerging |
| 4 |
poloclub/LLM-Attributor
LLM Attributor: Attribute LLM's Generated Text to Training Data |
|
Emerging |
| 5 |
THUDM/LongCite
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA |
|
Emerging |
| 6 |
UKPLab/5pils
Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!"... |
|
Emerging |
| 7 |
hao-ai-lab/Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models |
|
Emerging |
| 8 |
yueyu1030/AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as... |
|
Emerging |
| 9 |
nlpkeg/Know-MRI
This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A... |
|
Emerging |
| 10 |
leap-laboratories/PIZZA
An attribution library for LLMs |
|
Emerging |
| 11 |
phvv-me/frame-representation-hypothesis
Official Repository for Frame Representation Hypothesis paper |
|
Emerging |
| 12 |
msakarvadia/memorization
Localizing Memorized Sequences in Language Models |
|
Emerging |
| 13 |
ntt-dkiku/route-explainer
The official implementation of "RouteExplainer: An Explanation Framework for... |
|
Emerging |
| 14 |
AI4LIFE-GROUP/LLM_Explainer
Code for paper: Are Large Language Models Post Hoc Explainers? |
|
Emerging |
| 15 |
itsqyh/Awesome-LMMs-Mechanistic-Interpretability
A curated collection of resources focused on the Mechanistic... |
|
Emerging |
| 16 |
microsoft/MMLU-CF
A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025] |
|
Emerging |
| 17 |
parameterlab/apricot
Source code of "Calibrating Large Language Models Using Their Generations... |
|
Emerging |
| 18 |
songxiaoshuai/progco
Official Implementation of "ProgCo: Program Helps Self-Correction of Large... |
|
Emerging |
| 19 |
yinzhangyue/SelfAware
Do Large Language Models Know What They Don’t Know? |
|
Emerging |
| 20 |
jwergieluk/revllm
RevLLM -- Reverse Engineering Tools for Large Language Models |
|
Emerging |
| 21 |
Trustworthy-ML-Lab/VLG-CBM
[NeurIPS 24] A new training and evaluation framework for learning... |
|
Emerging |
| 22 |
llm-misinformation/llm-misinformation
The dataset and code for the ICLR 2024 paper "Can LLM-Generated... |
|
Emerging |
| 23 |
salesforce/factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing... |
|
Emerging |
| 24 |
Zhang-Yihao/Adversarial-Representation-Engineering
Official implementation repository for the paper Towards General Conceptual... |
|
Emerging |
| 25 |
plusnli/medical-knowledge-judgment
Codes and data for paper "Fact or Guesswork? Evaluating Large Language... |
|
Experimental |
| 26 |
UKPLab/arxiv2025-misleading-visualizations
Code and datasets accompanying the arXiv preprint: "Protecting multimodal... |
|
Experimental |
| 27 |
gsarti/pecore
Materials for "Quantifying the Plausibility of Context Reliance in Neural... |
|
Experimental |
| 28 |
yyy01/PAC
The official implementation of the paper "Data Contamination Calibration for... |
|
Experimental |
| 29 |
LFhase/CausalCOAT
[NeurIPS 2024] Discovery of the Hidden World with Large Language Models |
|
Experimental |
| 30 |
Trustworthy-ML-Lab/Describe-and-Dissect
[TMLR 25] An automated method for explaining complex neuron behaviors in... |
|
Experimental |
| 31 |
bgreenwell/statlingua
Explain Statistical Output with Large Language Models |
|
Experimental |
| 32 |
Strong-AI-Lab/Explanation-Generation
We introduce "ILearner-LLM" a framework that uses iterative enhancement with... |
|
Experimental |
| 33 |
Human-Centric-Machine-Learning/counterfactual-llms
Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024. |
|
Experimental |
| 34 |
HamedBabaei/CoLLM
CoLLM: Consistency of Large Language Models in Knowledge Engineering |
|
Experimental |
| 35 |
tbohne/saliency_kd
Saliency map-guided knowledge discovery for subclass identification with... |
|
Experimental |
| 36 |
Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability
[NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs |
|
Experimental |
| 37 |
braingpt-lovelab/backwards
Source code for |
|
Experimental |
| 38 |
Faisalse/LLM-reproducibility-audit
https://faisalse.github.io/LLM-reproducibility-audit/ |
|
Experimental |
| 39 |
Aniezka/xfact-fever
Official repository of FEVER@ACL 2025 paper "When Scale Meets Diversity:... |
|
Experimental |
| 40 |
kasia-kobalczyk/guess_llm
Implementation of the probing models presented in the ICLR 2026 paper... |
|
Experimental |
| 41 |
AikyamLab/llm-memorization
Understanding the memorization property of Large Language Models using Model... |
|
Experimental |
| 42 |
psunlpgroup/VerbosityLLM
This repository maintains dataset, predictions, and code for paper:... |
|
Experimental |
| 43 |
k-randl/self-explaining_llms
Official implementation of the papers "Evaluating the Reliability of... |
|
Experimental |
| 44 |
dennismstfc/building-the-soedermizer
Building the Söd★mizer: AI-Driven Machine Translation for Gender-Sensitive... |
|
Experimental |
| 45 |
zhaochen0110/LMLM
Code and data for "Improving Temporal Generalization of Pre-trained Language... |
|
Experimental |
| 46 |
lindaCai1997/data-attribution
Scalable Gradient-Based Attribution of LLM Behaviors |
|
Experimental |
| 47 |
RManLuo/llm-facteval
Source code of paper "Systematic Assessment of Factual Knowledge in Large... |
|
Experimental |
| 48 |
stvsever/aHFR_TokenSHAP
This repository implements an adaptive, hierarchy-aware Shapley method for... |
|
Experimental |
| 49 |
gianniskalyvas/llm-posthoc-explainability
A study on post-hoc explainability in LLMs using counterfactual... |
|
Experimental |
| 50 |
ShiningLab/CON2LM
This repository is for the paper Word Surprisal Correlates with Sentential... |
|
Experimental |
| 51 |
froge159/belief-project-sef
Activation-Space Interventions for Causal Control of Belief Representations... |
|
Experimental |
| 52 |
BridgeAI-Lab/LLM-as-Meta-Reviewer
[NAACL'25] Dataset and Evaluation Code for Paper LLMs as Meta-Reviewers’... |
|
Experimental |
| 53 |
abhilash-neog/FactCheckingBioLLMs
Evaluating the reasoning ability of LLMs specifically within the biomedical... |
|
Experimental |
| 54 |
ExcellentDarkTea/LLM-Causal-Discovery
Using LLMs to support expert elicitation in causal discovery, combining... |
|
Experimental |