jfainberg/self_dialogue_corpus
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
This corpus provides a large collection of self-dialogues, where individuals engage in conversations with themselves across diverse topics like movies, music, and sports. It takes raw conversation data and can output it as clean, formatted text files. It is useful for researchers and developers in natural language processing looking to train or evaluate conversational AI models.
107 stars. No commits in the last 6 months.
Use this if you need a specialized dataset of internal monologues or thought processes to improve the naturalness and breadth of your conversational AI.
Not ideal if you are looking for multi-party conversations or data with a specific domain outside of the entertainment and sports topics provided.
Stars
107
Forks
24
Language
Python
License
BSD-3-Clause
Category
Last pushed
Mar 19, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/jfainberg/self_dialogue_corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gunthercox/chatterbot-corpus
A multilingual dialog corpus
EdinburghNLP/awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
jkkummerfeld/irc-disentanglement
Dataset and model for disentangling chat on IRC
Tomiinek/MultiWOZ_Evaluation
Unified MultiWOZ evaluation scripts for the context-to-response task.
tae898/multimodal-datasets
Multimodal datasets.