AI4LIFE-GROUP/LLM_Explainer

Code for paper: Are Large Language Models Post Hoc Explainers?

/ 100

Emerging

This project helps machine learning practitioners understand why their classification models make certain decisions. It takes a trained model and a dataset, then uses large language models (LLMs) to generate human-readable explanations for individual predictions. The goal is to evaluate if LLMs can effectively explain complex model behavior, providing insights to data scientists or domain experts.

No commits in the last 6 months.

Use this if you are a machine learning researcher or data scientist investigating the interpretability of your classification models, especially when exploring how large language models can generate post-hoc explanations.

Not ideal if you need a user-friendly, out-of-the-box explainability tool for immediate deployment in a business application, as this project is research-focused and requires a technical understanding of ML pipelines and LLM prompting.

AI-explainability model-interpretability machine-learning-research classification-auditing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...

poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Explore Transformer Models

All categories Trending Transformer directory Insights