xmed-lab/TAM
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
When analyzing what a multimodal AI model sees in an image or video, this tool helps you understand precisely why it generated certain words. It takes your image or video and the model's text output, then shows you exactly which parts of the visual input "activated" each word. This is useful for AI researchers or anyone needing to debug or interpret multimodal AI models.
180 stars.
Use this if you need to visualize and explain the exact visual evidence a multimodal large language model used to generate specific words or phrases.
Not ideal if you're looking for a tool to explain text-only AI models or if you don't need to understand the fine-grained visual reasoning behind multimodal AI outputs.
Stars
180
Forks
7
Language
Python
License
—
Last pushed
Dec 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/xmed-lab/TAM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jessevig/bertviz
BertViz: Visualize Attention in Transformer Models
inseq-team/inseq
Interpretability for sequence generation models 🐛 🔍
EleutherAI/knowledge-neurons
A library for finding knowledge neurons in pretrained transformer models.
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for...
cdpierse/transformers-interpret
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model...