xmed-lab/TAM

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

31
/ 100
Emerging

When analyzing what a multimodal AI model sees in an image or video, this tool helps you understand precisely why it generated certain words. It takes your image or video and the model's text output, then shows you exactly which parts of the visual input "activated" each word. This is useful for AI researchers or anyone needing to debug or interpret multimodal AI models.

180 stars.

Use this if you need to visualize and explain the exact visual evidence a multimodal large language model used to generate specific words or phrases.

Not ideal if you're looking for a tool to explain text-only AI models or if you don't need to understand the fine-grained visual reasoning behind multimodal AI outputs.

AI-explanation model-debugging multimodal-AI AI-interpretability computer-vision
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 8 / 25

How are scores calculated?

Stars

180

Forks

7

Language

Python

License

Last pushed

Dec 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/xmed-lab/TAM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.