webdataset/WebDataset.jl

A high performance I/O library for deep learning in Julia, based on the PyTorch WebDataset library

33
/ 100
Emerging

This helps deep learning practitioners efficiently load massive datasets for model training. It takes collections of tar files, where each tar file contains groups of related data (like an image and its label), and outputs ready-to-use batches of data. Data scientists and machine learning engineers working with large image, audio, or text datasets will find this useful for speeding up their training workflows.

Use this if you are a machine learning engineer or data scientist training deep learning models in Julia and need to handle very large datasets efficiently, especially when dealing with many small files.

Not ideal if your dataset is small, already in a single, easily loadable file format (like a CSV for tabular data), or if you are not working with deep learning models.

deep-learning machine-learning-training large-scale-data image-recognition data-loading-optimization
No Package No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

14

Forks

1

Language

Julia

License

MIT

Last pushed

Dec 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/webdataset/WebDataset.jl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.