Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

44
/ 100
Emerging

This framework helps AI developers build Large Language Models (LLMs) that are transparent and explainable. Instead of a 'black box' that just gives answers, this system takes text data and produces an LLM that not only generates or classifies text but also shows *why* it made a particular decision, using human-understandable concepts. It's designed for machine learning engineers and researchers who need to ensure the safety, reliability, and trustworthiness of their LLM applications.

Use this if you need to develop an LLM for text generation or classification where understanding the model's reasoning process and ensuring its reliability is crucial.

Not ideal if your primary concern is only raw predictive accuracy without any need for transparency or interpretability in the model's decisions.

AI development NLP applications Model interpretability Machine learning engineering Trustworthy AI
No License No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 19 / 25

How are scores calculated?

Stars

31

Forks

18

Language

Python

License

Last pushed

Feb 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Trustworthy-ML-Lab/CB-LLMs"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.