google-research-datasets/query-wellformedness
25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.
This project provides a collection of 25,100 everyday questions, each rated by multiple people on how 'well-formed' or natural they sound. It helps you understand what makes a good, clear question in natural language. You input raw questions, and it provides a score indicating how well-formed they are. This is useful for anyone designing or evaluating systems that need to understand or generate human-like questions, such as AI trainers or conversational bot designers.
No commits in the last 6 months.
Use this if you need to train or evaluate a system that processes or generates natural language questions and requires a benchmark for well-formedness.
Not ideal if you're looking for factual answers to questions or a dataset of domain-specific inquiries.
Stars
85
Forks
11
Language
—
License
—
Category
Last pushed
Oct 09, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/google-research-datasets/query-wellformedness"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/RocketQA
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both...
shuaihuaiyi/QA
使用深度å¦ä¹ ç®—æ³•å®žçŽ°çš„ä¸æ–‡é—®ç”系统
allenai/deep_qa
A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
worldbank/iQual
iQual is a package that leverages natural language processing to scale up interpretative...
fhamborg/Giveme5W1H
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did...