google-research-datasets/query-wellformedness

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

30
/ 100
Emerging

This project provides a collection of 25,100 everyday questions, each rated by multiple people on how 'well-formed' or natural they sound. It helps you understand what makes a good, clear question in natural language. You input raw questions, and it provides a score indicating how well-formed they are. This is useful for anyone designing or evaluating systems that need to understand or generate human-like questions, such as AI trainers or conversational bot designers.

No commits in the last 6 months.

Use this if you need to train or evaluate a system that processes or generates natural language questions and requires a benchmark for well-formedness.

Not ideal if you're looking for factual answers to questions or a dataset of domain-specific inquiries.

natural-language-understanding conversational-ai chatbot-development ai-training question-answering-systems
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

85

Forks

11

Language

License

Last pushed

Oct 09, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/google-research-datasets/query-wellformedness"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.