rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
This tool helps researchers, AI trainers, and data scientists quickly build large image datasets. You provide a list of image URLs, and it automatically downloads, resizes, and organizes them into a structured collection, optionally with accompanying captions. It's designed for anyone needing to efficiently prepare massive image-text pairs for machine learning model training or other analytical tasks.
4,380 stars. Used by 1 other package. Available on PyPI.
Use this if you need to gather millions or even billions of images from the web and organize them into a ready-to-use dataset for training computer vision or multimodal AI models.
Not ideal if you only need to download a few dozen images or if your primary goal is to perform complex image manipulation beyond basic resizing and formatting.
Stars
4,380
Forks
372
Language
Python
License
MIT
Category
Last pushed
Oct 19, 2025
Commits (30d)
0
Dependencies
11
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rom1504/img2dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
devrimcavusoglu/pybboxes
Light weight toolkit for bounding boxes providing conversion between bounding box types and...
PyRetri/PyRetri
Open source deep learning based unsupervised image retrieval toolbox built on PyTorch🔥
Particle1904/DatasetHelpers
Dataset Helper program to automatically select, re scale and tag Datasets (composed of image and...
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
haltakov/natural-language-image-search
Search photos on Unsplash using natural language