google/BEGIN-dataset
A benchmark dataset for evaluating dialog system and natural language generation metrics.
This dataset helps evaluate how well AI dialogue systems attribute their responses to provided background knowledge. You provide the dialogue history, the AI's generated response, and the knowledge snippet it should be based on. It then outputs a label indicating if the response is 'Fully attributable', 'Not fully attributable', or 'Generic'. This is for researchers and developers working on conversational AI and natural language generation, specifically those focused on building reliable and grounded chatbots or virtual assistants.
No commits in the last 6 months.
Use this if you need a benchmark to assess the 'groundedness' and attribution quality of your dialogue system's responses against human judgments.
Not ideal if you are looking for a dataset to train a dialogue system from scratch, as this is designed for evaluation, not direct training.
Stars
39
Forks
2
Language
—
License
—
Category
Last pushed
Jun 13, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/google/BEGIN-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English