benedekrozemberczki/datasets
A repository of pretty cool datasets that I collected for network science and machine learning research.
This collection provides various social network datasets derived from platforms like Twitch, LastFM, Deezer, and GitHub. You can use this data, which includes user connections and sometimes user attributes, to analyze social structures and predict user behaviors like language, churn, or gender. It's ideal for data scientists, machine learning researchers, or social scientists working with graph-based analysis.
651 stars.
Use this if you need pre-collected, real-world social network graphs for tasks like predicting user characteristics, understanding community structures, or evaluating graph-based machine learning models.
Not ideal if you need continuously updated, real-time data or highly specialized datasets not focused on social network interactions.
Stars
651
Forks
83
Language
—
License
MIT
Category
Last pushed
Dec 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/benedekrozemberczki/datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.