PolyAI-LDN/conversational-datasets
Large datasets for conversational AI
This project provides standardized, massive datasets of real-world conversations from sources like Reddit, movie subtitles, and Amazon Q&A. It helps conversational AI researchers and practitioners train and evaluate their models by providing clean, structured conversational turns and responses. You put in raw text data from these sources, and you get out ready-to-use training and testing datasets.
1,387 stars. No commits in the last 6 months.
Use this if you are developing or researching conversational AI systems and need vast, diverse datasets of human dialogue to train and benchmark your models.
Not ideal if you need to create conversational datasets from proprietary data sources or require highly specialized dialogue structures not covered by general online conversations.
Stars
1,387
Forks
177
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 16, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/PolyAI-LDN/conversational-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Pinafore/qb
QANTA Quiz Bowl AI
KristiyanVachev/Question-Generation
Generating multiple choice questions from text using Machine Learning.
wuba/qa_match
A simple effective ToolKit for short text matching
mcQA-suite/mcQA
🔮 Answering multiple choice questions with Language Models.
dapurv5/awesome-question-answering
Resources, datasets, papers on Question Answering