google/BEGIN-dataset

A benchmark dataset for evaluating dialog system and natural language generation metrics.

/ 100

Experimental

This dataset helps evaluate how well AI dialogue systems attribute their responses to provided background knowledge. You provide the dialogue history, the AI's generated response, and the knowledge snippet it should be based on. It then outputs a label indicating if the response is 'Fully attributable', 'Not fully attributable', or 'Generic'. This is for researchers and developers working on conversational AI and natural language generation, specifically those focused on building reliable and grounded chatbots or virtual assistants.

No commits in the last 6 months.

Use this if you need a benchmark to assess the 'groundedness' and attribution quality of your dialogue system's responses against human judgments.

Not ideal if you are looking for a dataset to train a dialogue system from scratch, as this is designed for evaluation, not direct training.

conversational-ai natural-language-generation dialogue-systems chatbot-evaluation response-attribution

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

luheng/deep_srl

Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next

sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

loomchild/maligna

Bilingual sengence aligner

CK-Explorer/DuoSubs

Semantic subtitle aligner and merger for bilingual subtitle syncing.

coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Explore NLP Tools

All categories Trending NLP directory Insights