google/BEGIN-dataset

A benchmark dataset for evaluating dialog system and natural language generation metrics.

29
/ 100
Experimental

This dataset helps evaluate how well AI dialogue systems attribute their responses to provided background knowledge. You provide the dialogue history, the AI's generated response, and the knowledge snippet it should be based on. It then outputs a label indicating if the response is 'Fully attributable', 'Not fully attributable', or 'Generic'. This is for researchers and developers working on conversational AI and natural language generation, specifically those focused on building reliable and grounded chatbots or virtual assistants.

No commits in the last 6 months.

Use this if you need a benchmark to assess the 'groundedness' and attribution quality of your dialogue system's responses against human judgments.

Not ideal if you are looking for a dataset to train a dialogue system from scratch, as this is designed for evaluation, not direct training.

conversational-ai natural-language-generation dialogue-systems chatbot-evaluation response-attribution
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

39

Forks

2

Language

License

Last pushed

Jun 13, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/google/BEGIN-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.