TheBuleGanteng/interpretability-prototyping

This project is an educational exploration of Large Language Model (LLM) interpretability techniques, specifically focusing on Sparse Autoencoders (SAEs) as demonstrated in Anthropic's research: Scaling Monosemanticity.

/ 100

Experimental

No Package No Dependents

Maintenance 13 / 25

Adoption 0 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

—

Forks

—

Language

Jupyter Notebook

License

MIT

Category

explainability-interpretability-frameworks

Last pushed

Mar 18, 2026

Commits (30d)

GitHub

Explainability Interpretability Frameworks · 234 frameworks

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/TheBuleGanteng/interpretability-prototyping"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

obss/sahi

Framework agnostic sliced/tiled inference + interactive ui + error analysis plots

tensorflow/tcav

Code for the TCAV ML interpretability project

MAIF/shapash

🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent...

TeamHG-Memex/eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

csinva/imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling...

Explore ML Frameworks

All categories Trending ML Framework directory Insights