taufeeque9/codebook-features
Sparse and discrete interpretability tool for neural networks
This project helps machine learning practitioners understand and control how their neural networks make decisions. It takes a pre-trained neural network and converts it into a "codebook model," allowing you to see which internal "codes" are activated by specific input patterns. The output is a more transparent neural network, along with tools to visualize and even manipulate its internal workings. It is ideal for AI researchers, ML engineers, or data scientists working with complex neural networks who need to explain or steer their models.
No commits in the last 6 months. Available on PyPI.
Use this if you need to interpret why your neural network produces a particular output, debug unexpected model behavior, or causally influence your model's predictions by activating or deactivating specific internal features.
Not ideal if you are looking for a general-purpose model training framework or if your primary goal is to improve model accuracy without needing detailed insights into its internal decision-making process.
Stars
64
Forks
5
Language
Python
License
MIT
Last pushed
Feb 12, 2024
Commits (30d)
0
Dependencies
17
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/taufeeque9/codebook-features"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jessevig/bertviz
BertViz: Visualize Attention in Transformer Models
inseq-team/inseq
Interpretability for sequence generation models 🐛 🔍
EleutherAI/knowledge-neurons
A library for finding knowledge neurons in pretrained transformer models.
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for...
cdpierse/transformers-interpret
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model...