matt-seb-ho/WikiWhy

WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.

27
/ 100
Experimental

WikiWhy helps researchers and practitioners evaluate how well large language models can explain the causal relationships behind their answers. It provides over 9,000 'why' questions, answers, and detailed rationales grounded in Wikipedia facts. This benchmark is ideal for those developing or assessing AI systems that need to not only answer questions but also provide human-understandable explanations for cause-and-effect scenarios.

No commits in the last 6 months.

Use this if you are a researcher or AI developer who needs a robust dataset to benchmark how well your large language model explains cause-and-effect relationships.

Not ideal if you are looking for a dataset to pre-train a large language model, as its primary purpose is evaluation, with data separated to prevent contamination.

AI evaluation LLM interpretability causal reasoning question answering natural language processing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

48

Forks

1

Language

Python

License

MIT

Last pushed

Dec 07, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/matt-seb-ho/WikiWhy"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.