rentainhe/pytorch-distributed-training
Simple tutorials on Pytorch DDP training
Training large neural networks can be slow, but this project provides code examples and tutorials to speed up model training by distributing the workload across multiple GPUs on a single machine. It helps machine learning engineers and researchers optimize their deep learning experiments, taking a single Pytorch model and dataset as input and producing a faster training workflow.
286 stars. No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher struggling with long training times for your deep learning models on a single machine with multiple GPUs.
Not ideal if you are looking for distributed training across multiple machines or if you are not using PyTorch for your deep learning projects.
Stars
286
Forks
49
Language
Python
License
—
Category
Last pushed
Aug 19, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rentainhe/pytorch-distributed-training"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.