hitz-zentroa/This-is-not-a-Dataset

We introduce a large semi-automatically generated dataset of ~400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms that we use to evaluate LLMs

27
/ 100
Experimental

This project provides a large dataset of nearly 400,000 sentences about everyday knowledge that include negation (e.g., "a cat is not a dog"). It helps researchers and AI developers evaluate how well large language models (LLMs) understand and process negation. You input an LLM, and it outputs an evaluation of the LLM's ability to correctly determine if negated statements are true or false.

No commits in the last 6 months.

Use this if you are an AI researcher or developer building or evaluating large language models and need a robust benchmark for negation understanding.

Not ideal if you are looking for a dataset for general language understanding or for training models on tasks unrelated to logical negation in commonsense knowledge.

AI evaluation natural-language-processing language-model-benchmarking computational-linguistics commonsense-reasoning
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

13

Forks

1

Language

Python

License

Apache-2.0

Last pushed

May 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/hitz-zentroa/This-is-not-a-Dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.