TheBuleGanteng/interpretability-prototyping
This project is an educational exploration of Large Language Model (LLM) interpretability techniques, specifically focusing on Sparse Autoencoders (SAEs) as demonstrated in Anthropic's research: Scaling Monosemanticity.
Stars
—
Forks
—
Language
Jupyter Notebook
License
MIT
Last pushed
Mar 18, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/TheBuleGanteng/interpretability-prototyping"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
obss/sahi
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
tensorflow/tcav
Code for the TCAV ML interpretability project
MAIF/shapash
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent...
TeamHG-Memex/eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
csinva/imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling...