nlpkeg/Know-MRI
This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models
This toolkit helps researchers and practitioners understand how Large Language Models (LLMs) arrive at their answers. You input a pre-trained LLM (like Llama2 or GPT-J) and a dataset (such as those used for factual recall or math problems), and it outputs detailed explanations, images, and tables showing the internal workings and knowledge mechanisms of the LLM. It's designed for anyone trying to interpret and debug LLM behavior.
Use this if you need to deeply analyze and interpret the reasoning or 'knowledge' within various large language models using a range of established interpretability methods.
Not ideal if you are looking for a simple, black-box performance evaluation or if you only need to fine-tune an LLM without understanding its internal knowledge mechanisms.
Stars
14
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 08, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/nlpkeg/Know-MRI"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MadryLab/context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
microsoft/augmented-interpretable-models
Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.
Trustworthy-ML-Lab/CB-LLMs
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...
poloclub/LLM-Attributor
LLM Attributor: Attribute LLM's Generated Text to Training Data
THUDM/LongCite
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA