CEA-LIST/RPCDataloader
A variant of the PyTorch Dataloader using remote workers.
When training machine learning models, you often need to load large amounts of data from disk. This tool helps machine learning engineers or researchers who are using PyTorch to load data more efficiently by distributing the data loading process across multiple remote computers. It takes your dataset's location and desired transformations, and outputs ready-to-use batches of data for your model training.
No commits in the last 6 months. Available on PyPI.
Use this if you are a machine learning engineer or researcher training PyTorch models and are bottlenecked by data loading from a single machine, or if you want to utilize data stored on multiple remote servers without copying it.
Not ideal if your dataset is small enough to fit on a single machine and data loading is not a performance bottleneck, or if you are not using PyTorch for your machine learning workflows.
Stars
21
Forks
1
Language
Python
License
—
Category
Last pushed
Apr 01, 2023
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/CEA-LIST/RPCDataloader"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.