gszfwsb/AutoGnothi

Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"

/ 100

Experimental

This tool helps AI researchers and practitioners gain trust in their complex AI models, especially those built using transformer architectures. It takes an existing "black-box" model and integrates a small component, allowing the model to explain its own decisions using human-understandable concepts. This results in models that not only perform well but also transparently show why they made a particular prediction, without the heavy computational cost of traditional explanation methods.

No commits in the last 6 months.

Use this if you need to understand and explain the reasoning behind predictions made by your black-box transformer models, without significantly impacting performance or adding excessive computational overhead.

Not ideal if you are working with very simple models that are already inherently interpretable, or if you do not require explanations for your model's decisions.

Explainable AI Model Interpretability Transformer Models Deep Learning Research Trustworthy AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

jessevig/bertviz

BertViz: Visualize Attention in Transformer Models

inseq-team/inseq

Interpretability for sequence generation models 🐛 🔍

EleutherAI/knowledge-neurons

A library for finding knowledge neurons in pretrained transformer models.

hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for...

cdpierse/transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model...

Explore Transformer Models

All categories Trending Transformer directory Insights