rom1504/img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

61
/ 100
Established

This tool helps researchers, AI trainers, and data scientists quickly build large image datasets. You provide a list of image URLs, and it automatically downloads, resizes, and organizes them into a structured collection, optionally with accompanying captions. It's designed for anyone needing to efficiently prepare massive image-text pairs for machine learning model training or other analytical tasks.

4,380 stars. Used by 1 other package. Available on PyPI.

Use this if you need to gather millions or even billions of images from the web and organize them into a ready-to-use dataset for training computer vision or multimodal AI models.

Not ideal if you only need to download a few dozen images or if your primary goal is to perform complex image manipulation beyond basic resizing and formatting.

AI training data computer vision large-scale image collection machine learning datasets multimodal AI
Maintenance 6 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

4,380

Forks

372

Language

Python

License

MIT

Last pushed

Oct 19, 2025

Commits (30d)

0

Dependencies

11

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rom1504/img2dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.