huggingface/datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

82
/ 100
Verified

This tool helps AI practitioners quickly get and prepare datasets for training and evaluating machine learning models. You provide a dataset name or your own data files (like images, audio, or text), and it outputs a ready-to-use dataset that can be fed directly into your model. It's designed for machine learning engineers and data scientists working with AI.

21,273 stars. Used by 356 other packages. Actively maintained with 22 commits in the last 30 days. Available on PyPI.

Use this if you need to efficiently load and pre-process a wide variety of public or private datasets for your AI models, especially when dealing with large datasets that exceed system memory.

Not ideal if you are not working with machine learning models or if your data preparation needs are minimal and can be handled with basic scripting.

machine-learning-engineering natural-language-processing computer-vision audio-processing data-preparation
Maintenance 20 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

21,273

Forks

3,140

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

22

Dependencies

14

Reverse dependents

356

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/huggingface/datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.