sigvt/vtuber-livechat-dataset
📊 VTuber 1B: Billion-scale Live Chat and Moderation Event Dataset
This project provides a massive collection of live chat messages, 'Super Chats' (paid messages), and moderation events (like bans and deletions) from virtual YouTubers' streams. It's designed for researchers, social scientists, or anyone studying online communities, trends, and content moderation. You can analyze raw chat data or pre-calculated statistics to understand viewer engagement, identify common spam/toxic phrases, or visualize demographic patterns.
No commits in the last 6 months.
Use this if you need a large-scale dataset of real-world live stream interactions to study audience behavior, language use, or the effectiveness of moderation in online entertainment communities.
Not ideal if you're looking for a small, curated dataset for quick qualitative analysis or if your research focus is outside of virtual streamer communities.
Stars
92
Forks
5
Language
Python
License
MIT
Category
Last pushed
Aug 04, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/sigvt/vtuber-livechat-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KOKOSde/localmod
Self-hosted content moderation API that outperforms Amazon Comprehend. 100% offline, your data...
Kalebu/Plagiarism-checker-Python
A python project for checking plagiarism of documents based on cosine similarity
credo-ai/credoai_lens
Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model...
jina-ai/example-app-store
App store search example, using Jina as backend and Streamlit as frontend
ogulcanaydogan/AI-Provenance-Tracker
Open-source multi-modal AI content detection platform, analyses text, images, audio, and video...