itsqyh/Awesome-LMMs-Mechanistic-Interpretability

A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.

/ 100

Emerging

This collection helps AI researchers and practitioners understand how Large Multimodal Models (LMMs) work internally. It gathers surveys, blog posts, and research papers, providing insights into how these complex models process and link different types of information like images and text. This resource is for anyone working on or studying the inner workings of AI models, especially those focused on their transparency and trustworthiness.

192 stars.

Use this if you are researching or developing Large Multimodal Models and need to explore how they make decisions or represent information internally.

Not ideal if you are looking for ready-to-use code or tools to apply LMMs for practical tasks without needing to understand their internal mechanics.

AI research model interpretability multimodal AI explainable AI AI safety

No License No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

192

Forks

Language

—

License

—

Higher-rated alternatives

MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...

poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Explore Transformer Models

All categories Trending Transformer directory Insights