BaguaSys/bagua
Bagua Speeds up PyTorch
This project helps machine learning engineers and researchers accelerate the training of their deep learning models on PyTorch. It takes your existing PyTorch model and training script, and through various optimizations like distributed training across multiple GPUs or machines, faster data loading, and communication enhancements, produces a significantly faster training process. The primary users are those working with large-scale deep learning models, especially in fields like computer vision or natural language processing, who need to reduce training times.
884 stars. No commits in the last 6 months. Available on PyPI.
Use this if you are a machine learning engineer or researcher using PyTorch and need to drastically speed up the training time of your deep learning models, particularly when working with large datasets or models that require multi-GPU or multi-machine setups.
Not ideal if you are not using PyTorch, or if your deep learning models are small and train quickly on a single GPU, as the overhead may outweigh the benefits.
Stars
884
Forks
81
Language
Python
License
MIT
Category
Last pushed
Aug 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/BaguaSys/bagua"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.