skywalker023/sodaverse

🥤🧑🏻‍🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization"

47
/ 100
Emerging

This project helps AI researchers create large datasets of natural-sounding conversations infused with social commonsense. It takes existing social commonsense knowledge bases and distills them into dialogue, producing new, realistic conversational datasets. AI researchers and dialogue system developers who need to train or evaluate conversational AI models would use this.

239 stars.

Use this if you are an AI researcher or developer looking to generate large-scale, high-quality dialogue datasets that reflect real-world social interactions and commonsense understanding.

Not ideal if you need a conversational AI model for knowledge-intensive domains like science, medical advice, or legal consultation, as this model is primarily for social chitchat.

conversational-ai dialogue-systems natural-language-processing ai-research language-model-training
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

239

Forks

14

Language

Python

License

MIT

Last pushed

Jan 23, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/skywalker023/sodaverse"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.