taufeeque9/codebook-features

Sparse and discrete interpretability tool for neural networks

/ 100

Emerging

This project helps machine learning practitioners understand and control how their neural networks make decisions. It takes a pre-trained neural network and converts it into a "codebook model," allowing you to see which internal "codes" are activated by specific input patterns. The output is a more transparent neural network, along with tools to visualize and even manipulate its internal workings. It is ideal for AI researchers, ML engineers, or data scientists working with complex neural networks who need to explain or steer their models.

No commits in the last 6 months. Available on PyPI.

Use this if you need to interpret why your neural network produces a particular output, debug unexpected model behavior, or causally influence your model's predictions by activating or deactivating specific internal features.

Not ideal if you are looking for a general-purpose model training framework or if your primary goal is to improve model accuracy without needing detailed insights into its internal decision-making process.

AI interpretability neural network debugging model explainability causal AI large language model analysis

Stale 6m

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

jessevig/bertviz

BertViz: Visualize Attention in Transformer Models

inseq-team/inseq

Interpretability for sequence generation models 🐛 🔍

EleutherAI/knowledge-neurons

A library for finding knowledge neurons in pretrained transformer models.

hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for...

cdpierse/transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model...

Explore Transformer Models

All categories Trending Transformer directory Insights