nlpkeg/Know-MRI

This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models

/ 100

Emerging

This toolkit helps researchers and practitioners understand how Large Language Models (LLMs) arrive at their answers. You input a pre-trained LLM (like Llama2 or GPT-J) and a dataset (such as those used for factual recall or math problems), and it outputs detailed explanations, images, and tables showing the internal workings and knowledge mechanisms of the LLM. It's designed for anyone trying to interpret and debug LLM behavior.

Use this if you need to deeply analyze and interpret the reasoning or 'knowledge' within various large language models using a range of established interpretability methods.

Not ideal if you are looking for a simple, black-box performance evaluation or if you only need to fine-tune an LLM without understanding its internal knowledge mechanisms.

AI interpretability LLM analysis model debugging natural language processing research knowledge representation

No Package No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...

poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Explore Transformer Models

All categories Trending Transformer directory Insights