matt-seb-ho/WikiWhy
WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.
WikiWhy helps researchers and practitioners evaluate how well large language models can explain the causal relationships behind their answers. It provides over 9,000 'why' questions, answers, and detailed rationales grounded in Wikipedia facts. This benchmark is ideal for those developing or assessing AI systems that need to not only answer questions but also provide human-understandable explanations for cause-and-effect scenarios.
No commits in the last 6 months.
Use this if you are a researcher or AI developer who needs a robust dataset to benchmark how well your large language model explains cause-and-effect relationships.
Not ideal if you are looking for a dataset to pre-train a large language model, as its primary purpose is evaluation, with data separated to prevent contamination.
Stars
48
Forks
1
Language
Python
License
MIT
Last pushed
Dec 07, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/matt-seb-ho/WikiWhy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
obss/sahi
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
tensorflow/tcav
Code for the TCAV ML interpretability project
MAIF/shapash
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent...
TeamHG-Memex/eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
csinva/imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling...