jbrownlee/Datasets
Machine learning datasets used in tutorials on MachineLearningMastery.com
This collection provides a stable source for various datasets commonly used in machine learning exercises. It offers clean, pre-formatted CSV files for tasks like predicting outcomes from medical records or financial data, classifying images, forecasting sales, or analyzing text. Data scientists, students, and practitioners learning or experimenting with machine learning models will find these useful for practice and validating algorithms.
1,224 stars. No commits in the last 6 months.
Use this if you need reliable, pre-processed datasets for training and testing machine learning models in classification, regression, or time series analysis.
Not ideal if you require real-time data feeds, highly specialized domain-specific datasets not listed, or if your primary need is for raw, unprocessed data for feature engineering practice.
Stars
1,224
Forks
1,497
Language
—
License
—
Category
Last pushed
Aug 15, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jbrownlee/Datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.