gszfwsb/AutoGnothi
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"
This tool helps AI researchers and practitioners gain trust in their complex AI models, especially those built using transformer architectures. It takes an existing "black-box" model and integrates a small component, allowing the model to explain its own decisions using human-understandable concepts. This results in models that not only perform well but also transparently show why they made a particular prediction, without the heavy computational cost of traditional explanation methods.
No commits in the last 6 months.
Use this if you need to understand and explain the reasoning behind predictions made by your black-box transformer models, without significantly impacting performance or adding excessive computational overhead.
Not ideal if you are working with very simple models that are already inherently interpretable, or if you do not require explanations for your model's decisions.
Stars
24
Forks
1
Language
Python
License
—
Last pushed
Mar 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/gszfwsb/AutoGnothi"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jessevig/bertviz
BertViz: Visualize Attention in Transformer Models
inseq-team/inseq
Interpretability for sequence generation models 🐛 🔍
EleutherAI/knowledge-neurons
A library for finding knowledge neurons in pretrained transformer models.
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for...
cdpierse/transformers-interpret
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model...