chatterbot-corpus and self_dialogue_corpus
These are complementary resources—the first provides general-purpose multilingual conversational training data for building dialogue systems, while the second offers domain-specific self-dialogue data for training systems that generate internal reasoning or multi-turn reasoning chains, particularly in entertainment domains.
About chatterbot-corpus
gunthercox/chatterbot-corpus
A multilingual dialog corpus
This project helps you quickly set up a conversational chatbot. It provides pre-written example dialogues in various languages, which you feed into your chatbot to give it a foundation for understanding and responding to common inputs. This is ideal for anyone building a chatbot who wants to give it a diverse set of responses right from the start.
About self_dialogue_corpus
jfainberg/self_dialogue_corpus
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
This corpus provides a large collection of self-dialogues, where individuals engage in conversations with themselves across diverse topics like movies, music, and sports. It takes raw conversation data and can output it as clean, formatted text files. It is useful for researchers and developers in natural language processing looking to train or evaluate conversational AI models.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work