huggingface/datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

/ 100

Verified

This tool helps AI practitioners quickly get and prepare datasets for training and evaluating machine learning models. You provide a dataset name or your own data files (like images, audio, or text), and it outputs a ready-to-use dataset that can be fed directly into your model. It's designed for machine learning engineers and data scientists working with AI.

21,273 stars. Used by 356 other packages. Actively maintained with 22 commits in the last 30 days. Available on PyPI.

Use this if you need to efficiently load and pre-process a wide variety of public or private datasets for your AI models, especially when dealing with large datasets that exceed system memory.

Not ideal if you are not working with machine learning models or if your data preparation needs are minimal and can be handled with basic scripting.

machine-learning-engineering natural-language-processing computer-vision audio-processing data-preparation

Maintenance 20 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 22 / 25

How are scores calculated?

Stars

21,273

Forks

3,140

Language

Python

License

Apache-2.0

Community Discussion

Datasets for Reconstructing Visual Perception from Brain Data 62 points · 15 comments · Mar 2026

Recent Releases

4.8.4 23 Mar 2026 4.8.3 19 Mar 2026 4.8.2 17 Mar 2026 4.8.1 17 Mar 2026 4.8.0 16 Mar 2026

Compare

datasets and home-assistant-datasets

Related frameworks

allenporter/home-assistant-datasets

This package is a collection of datasets for evaluating AI Models in the context of Home Assistant.

little1d/SpectrumLab

A pioneering unified platform designed to systematize and accelerate deep learning research in...

J0nasW/science-datalake

Unified data lake of 293M scientific papers from 8 scholarly sources + 13 ontologies (960 GB...

Explore ML Frameworks

All categories Trending ML Framework directory Insights