PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

47
/ 100
Emerging

This project provides standardized, massive datasets of real-world conversations from sources like Reddit, movie subtitles, and Amazon Q&A. It helps conversational AI researchers and practitioners train and evaluate their models by providing clean, structured conversational turns and responses. You put in raw text data from these sources, and you get out ready-to-use training and testing datasets.

1,387 stars. No commits in the last 6 months.

Use this if you are developing or researching conversational AI systems and need vast, diverse datasets of human dialogue to train and benchmark your models.

Not ideal if you need to create conversational datasets from proprietary data sources or require highly specialized dialogue structures not covered by general online conversations.

conversational-ai natural-language-processing dialogue-systems machine-learning-research chatbot-development
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

1,387

Forks

177

Language

Python

License

Apache-2.0

Last pushed

Nov 16, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/PolyAI-LDN/conversational-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.