LLM Interpretability & Explainability LLM Tools
Tools and frameworks for understanding, explaining, and visualizing how large language models make decisions through mechanistic analysis, post-hoc explanations, concept-based interpretability, and neuron-level attribution methods. Does NOT include general model evaluation, bias detection, hallucination mitigation, or knowledge editing.
There are 30 llm interpretability & explainability tools tracked. The highest-rated is filipnaudot/llmSHAP at 49/100 with 16 stars.
Get all 30 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-interpretability-explainability&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
filipnaudot/llmSHAP
llmSHAP: a multi-threaded explainability framework using Shapley values for... |
|
Emerging |
| 2 |
microsoft/automated-brain-explanations
Generating and validating natural-language explanations for the brain. |
|
Emerging |
| 3 |
CAS-SIAT-XinHai/CPsyCoun
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and... |
|
Emerging |
| 4 |
wesg52/universal-neurons
Universal Neurons in GPT2 Language Models |
|
Emerging |
| 5 |
ICTMCG/LLM-for-misinformation-research
Paper list of misinformation research using (multi-modal) large language... |
|
Emerging |
| 6 |
marcusm117/IdentityChain
[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large... |
|
Emerging |
| 7 |
shahriargolchin/DCQ
The official repository for the paper entitled "Data Contamination Quiz: A... |
|
Emerging |
| 8 |
Wang-ML-Lab/interpretable-foundation-models
[ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy... |
|
Emerging |
| 9 |
OpenMOSS/Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know? |
|
Emerging |
| 10 |
amazon-science/ContraCLM
[ACL 2023] Code for ContraCLM: Contrastive Learning For Causal Language Model |
|
Experimental |
| 11 |
OSU-NLP-Group/AttrScore
Code, datasets, models for the paper "Automatic Evaluation of Attribution by... |
|
Experimental |
| 12 |
MozerWang/DEMO
[ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained... |
|
Experimental |
| 13 |
YuweiYin/SWI
SWI: Speaking with Intent in Large Language Models |
|
Experimental |
| 14 |
stefdesabbata/geospatial-mechanistic-interpretability
Geospatial Mechanistic Interpretability of Large Language Models |
|
Experimental |
| 15 |
12kimih/HiCUPID
[ACL 2025] Exploring the Potential of LLMs as Personalized Assistants:... |
|
Experimental |
| 16 |
Joe-b-20/CoreVital
Mechanistic interpretability toolkit for monitoring LLM internal health.... |
|
Experimental |
| 17 |
DataScienceUIBK/llm-reranking-generalization-study
How Good are LLM-based Rerankers? Accepted at EMNLP Findings 2025 |
|
Experimental |
| 18 |
AColonnaDistria/llm2sql-consistency-analysis
LLM-to-SQL analysis tool designed to quantify non-determinism behavior of... |
|
Experimental |
| 19 |
jiangjiechen/uncommongen
Resources for our ACL 2023 paper: "Say What You Mean! Large Language Models... |
|
Experimental |
| 20 |
Nearzero-S/Intuitive-MechInterp
Helping Humans Understand Our Processing |
|
Experimental |
| 21 |
youzhaozhao/LLM-Heuristic-Graph-Coloring
Exploring LLM-assisted design of graph coloring heuristics through ... |
|
Experimental |
| 22 |
DAMO-NLP-SG/LLM-argumentation
[ACL2024] Exploring the Potential of Large Language Models in Computational... |
|
Experimental |
| 23 |
GovAIx/QualityModulation
[Nature Communications] Linguistic features of AI mis/disinformation and the... |
|
Experimental |
| 24 |
armlynobinguar/LLM-XAI-Papers
A curated collection of research papers on explainability and... |
|
Experimental |
| 25 |
emanuelemessina/broken-morals
Moral copilot for high-stakes ethical decisions in business contexts |
|
Experimental |
| 26 |
3B-Group/ConvRe
🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse... |
|
Experimental |
| 27 |
phvv-me/icip2025
Official Repository for Vision Language Model Interpretability with Concept... |
|
Experimental |
| 28 |
pvicinanza/llm_prompt_tuning_conspiracies
This repository provides the data and code needed to replicate "Semantic... |
|
Experimental |
| 29 |
ChuanMeng/SIP
Code for the CIKM 2023 long paper: System Initiative Prediction for... |
|
Experimental |
| 30 |
vbainwala/Benchmarking-LLMs-Indic-Languages
Benchmarking Study of Bloomz-560m, mBART-large, IndicBART on the Indic Languages |
|
Experimental |