matt-seb-ho/WikiWhy

WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.

/ 100

Experimental

WikiWhy helps researchers and practitioners evaluate how well large language models can explain the causal relationships behind their answers. It provides over 9,000 'why' questions, answers, and detailed rationales grounded in Wikipedia facts. This benchmark is ideal for those developing or assessing AI systems that need to not only answer questions but also provide human-understandable explanations for cause-and-effect scenarios.

No commits in the last 6 months.

Use this if you are a researcher or AI developer who needs a robust dataset to benchmark how well your large language model explains cause-and-effect relationships.

Not ideal if you are looking for a dataset to pre-train a large language model, as its primary purpose is evaluation, with data separated to prevent contamination.

AI evaluation LLM interpretability causal reasoning question answering natural language processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

obss/sahi

Framework agnostic sliced/tiled inference + interactive ui + error analysis plots

tensorflow/tcav

Code for the TCAV ML interpretability project

MAIF/shapash

🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent...

TeamHG-Memex/eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

csinva/imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling...

Explore ML Frameworks

All categories Trending ML Framework directory Insights