huggingface/datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
This tool helps AI practitioners quickly get and prepare datasets for training and evaluating machine learning models. You provide a dataset name or your own data files (like images, audio, or text), and it outputs a ready-to-use dataset that can be fed directly into your model. It's designed for machine learning engineers and data scientists working with AI.
21,273 stars. Used by 356 other packages. Actively maintained with 22 commits in the last 30 days. Available on PyPI.
Use this if you need to efficiently load and pre-process a wide variety of public or private datasets for your AI models, especially when dealing with large datasets that exceed system memory.
Not ideal if you are not working with machine learning models or if your data preparation needs are minimal and can be handled with basic scripting.
Stars
21,273
Forks
3,140
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
22
Dependencies
14
Reverse dependents
356
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/huggingface/datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Community Discussion
Recent Releases
Related frameworks
allenporter/home-assistant-datasets
This package is a collection of datasets for evaluating AI Models in the context of Home Assistant.
little1d/SpectrumLab
A pioneering unified platform designed to systematize and accelerate deep learning research in...
J0nasW/science-datalake
Unified data lake of 293M scientific papers from 8 scholarly sources + 13 ontologies (960 GB...