google-research-datasets/query-wellformedness

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

/ 100

Emerging

This project provides a collection of 25,100 everyday questions, each rated by multiple people on how 'well-formed' or natural they sound. It helps you understand what makes a good, clear question in natural language. You input raw questions, and it provides a score indicating how well-formed they are. This is useful for anyone designing or evaluating systems that need to understand or generate human-like questions, such as AI trainers or conversational bot designers.

No commits in the last 6 months.

Use this if you need to train or evaluate a system that processes or generates natural language questions and requires a benchmark for well-formedness.

Not ideal if you're looking for factual answers to questions or a dataset of domain-specific inquiries.

natural-language-understanding conversational-ai chatbot-development ai-training question-answering-systems

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

PaddlePaddle/RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both...

shuaihuaiyi/QA

使用深度学习算法实现的中文问答系统

allenai/deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)

worldbank/iQual

iQual is a package that leverages natural language processing to scale up interpretative...

fhamborg/Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did...

Explore NLP Tools

All categories Trending NLP directory Insights