skywalker023/sodaverse

🥤🧑🏻‍🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization"

/ 100

Emerging

This project helps AI researchers create large datasets of natural-sounding conversations infused with social commonsense. It takes existing social commonsense knowledge bases and distills them into dialogue, producing new, realistic conversational datasets. AI researchers and dialogue system developers who need to train or evaluate conversational AI models would use this.

239 stars.

Use this if you are an AI researcher or developer looking to generate large-scale, high-quality dialogue datasets that reflect real-world social interactions and commonsense understanding.

Not ideal if you need a conversational AI model for knowledge-intensive domains like science, medical advice, or legal consultation, as this model is primarily for social chitchat.

conversational-ai dialogue-systems natural-language-processing ai-research language-model-training

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

239

Forks

Language

Python

License

MIT

Higher-rated alternatives

monarch-initiative/ontogpt

LLM-based ontological extraction tools, including SPIRES

weAIDB/awesome-data-llm

Official Repository of "LLM × DATA" Survey Paper

AXYZdong/AMchat

AM (Advanced Mathematics) Chat is a large language model that integrates advanced mathematical...

Y-Research-SBU/TimeSeriesScientist

Official Repository for TimeSeriesScientist

open-chinese/poetry-collection

中文《诗歌总集》，距今为止最全面，最系统的中文诗词数据集，统一数据建模.

Explore LLM Tools

All categories Trending LLM Tool directory Insights