LLM Interpretability & Explainability LLM Tools

Tools and frameworks for understanding, explaining, and visualizing how large language models make decisions through mechanistic analysis, post-hoc explanations, concept-based interpretability, and neuron-level attribution methods. Does NOT include general model evaluation, bias detection, hallucination mitigation, or knowledge editing.

There are 30 llm interpretability & explainability tools tracked. The highest-rated is filipnaudot/llmSHAP at 49/100 with 16 stars.

Get all 30 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-interpretability-explainability&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	filipnaudot/llmSHAP llmSHAP: a multi-threaded explainability framework using Shapley values for...	49	Emerging	16	Python
2	microsoft/automated-brain-explanations Generating and validating natural-language explanations for the brain.	48	Emerging	63	Jupyter Notebook
3	CAS-SIAT-XinHai/CPsyCoun [ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and...	44	Emerging	218	Jupyter Notebook
4	wesg52/universal-neurons Universal Neurons in GPT2 Language Models	39	Emerging	30	Jupyter Notebook
5	ICTMCG/LLM-for-misinformation-research Paper list of misinformation research using (multi-modal) large language...	38	Emerging	321	—
6	marcusm117/IdentityChain [ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large...	36	Emerging	10	Python
7	shahriargolchin/DCQ The official repository for the paper entitled "Data Contamination Quiz: A...	35	Emerging	6	Python
8	Wang-ML-Lab/interpretable-foundation-models [ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy...	31	Emerging	18	Python
9	OpenMOSS/Say-I-Dont-Know [ICML'2024] Can AI Assistants Know What They Don't Know?	30	Emerging	85	Python
10	amazon-science/ContraCLM [ACL 2023] Code for ContraCLM: Contrastive Learning For Causal Language Model	29	Experimental	35	Python
11	OSU-NLP-Group/AttrScore Code, datasets, models for the paper "Automatic Evaluation of Attribution by...	29	Experimental	56	Python
12	MozerWang/DEMO [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained...	26	Experimental	22	Python
13	YuweiYin/SWI SWI: Speaking with Intent in Large Language Models	26	Experimental	6	Python
14	stefdesabbata/geospatial-mechanistic-interpretability Geospatial Mechanistic Interpretability of Large Language Models	23	Experimental	18	Jupyter Notebook
15	12kimih/HiCUPID [ACL 2025] Exploring the Potential of LLMs as Personalized Assistants:...	22	Experimental	14	Python
16	Joe-b-20/CoreVital Mechanistic interpretability toolkit for monitoring LLM internal health....	22	Experimental	—	Python
17	DataScienceUIBK/llm-reranking-generalization-study How Good are LLM-based Rerankers? Accepted at EMNLP Findings 2025	22	Experimental	12	—
18	AColonnaDistria/llm2sql-consistency-analysis LLM-to-SQL analysis tool designed to quantify non-determinism behavior of...	21	Experimental	—	Python
19	jiangjiechen/uncommongen Resources for our ACL 2023 paper: "Say What You Mean! Large Language Models...	21	Experimental	9	Python
20	Nearzero-S/Intuitive-MechInterp Helping Humans Understand Our Processing	21	Experimental	—	—
21	youzhaozhao/LLM-Heuristic-Graph-Coloring Exploring LLM-assisted design of graph coloring heuristics through ...	19	Experimental	—	Jupyter Notebook
22	DAMO-NLP-SG/LLM-argumentation [ACL2024] Exploring the Potential of Large Language Models in Computational...	19	Experimental	17	Python
23	GovAIx/QualityModulation [Nature Communications] Linguistic features of AI mis/disinformation and the...	17	Experimental	—	Jupyter Notebook
24	armlynobinguar/LLM-XAI-Papers A curated collection of research papers on explainability and...	17	Experimental	—	Python
25	emanuelemessina/broken-morals Moral copilot for high-stakes ethical decisions in business contexts	15	Experimental	—	TeX
26	3B-Group/ConvRe 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse...	14	Experimental	24	Python
27	phvv-me/icip2025 Official Repository for Vision Language Model Interpretability with Concept...	13	Experimental	—	Jupyter Notebook
28	pvicinanza/llm_prompt_tuning_conspiracies This repository provides the data and code needed to replicate "Semantic...	11	Experimental	—	Jupyter Notebook
29	ChuanMeng/SIP Code for the CIKM 2023 long paper: System Initiative Prediction for...	11	Experimental	3	Python
30	vbainwala/Benchmarking-LLMs-Indic-Languages Benchmarking Study of Bloomz-560m, mBART-large, IndicBART on the Indic Languages	11	Experimental	—	Python