PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

/ 100

Emerging

This project provides standardized, massive datasets of real-world conversations from sources like Reddit, movie subtitles, and Amazon Q&A. It helps conversational AI researchers and practitioners train and evaluate their models by providing clean, structured conversational turns and responses. You put in raw text data from these sources, and you get out ready-to-use training and testing datasets.

1,387 stars. No commits in the last 6 months.

Use this if you are developing or researching conversational AI systems and need vast, diverse datasets of human dialogue to train and benchmark your models.

Not ideal if you need to create conversational datasets from proprietary data sources or require highly specialized dialogue structures not covered by general online conversations.

conversational-ai natural-language-processing dialogue-systems machine-learning-research chatbot-development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

1,387

Forks

177

Language

Python

License

Apache-2.0

Higher-rated alternatives

Pinafore/qb

QANTA Quiz Bowl AI

KristiyanVachev/Question-Generation

Generating multiple choice questions from text using Machine Learning.

wuba/qa_match

A simple effective ToolKit for short text matching

mcQA-suite/mcQA

🔮 Answering multiple choice questions with Language Models.

dapurv5/awesome-question-answering

Resources, datasets, papers on Question Answering

Explore ML Frameworks

All categories Trending ML Framework directory Insights