explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
This tool helps developers quickly access standard machine learning datasets to build and test their natural language processing (NLP) or image recognition models. It provides readily available text data for tasks like sentiment analysis, question answering, or image data for recognition, giving developers the necessary input to train and evaluate their algorithms.
47 stars and 10,308 monthly downloads. Used by 1 other package. Available on PyPI.
Use this if you are a developer building or testing machine learning models and need convenient access to well-known, pre-structured datasets for tasks like text classification or image recognition.
Not ideal if you are a non-developer seeking an out-of-the-box solution to analyze your own specific data or a tool for general data management.
Stars
47
Forks
16
Language
Python
License
MIT
Category
Last pushed
Mar 26, 2026
Monthly downloads
10,308
Commits (30d)
0
Dependencies
5
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/explosion/ml-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
alan-turing-institute/CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement...