alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Training large-scale deep learning models often requires significant computational power. This library helps deep learning engineers train bigger, more complex models using multiple GPUs more efficiently. You provide your existing model code, and the library optimizes how it runs across your hardware, allowing you to train larger models faster and with less memory.
271 stars. No commits in the last 6 months.
Use this if you are a deep learning engineer struggling to train very large models due to memory constraints or slow training times, and you want to leverage distributed computing without extensive manual setup.
Not ideal if you are working with small models that don't require distributed training, or if you prefer to manually manage all aspects of your parallel training setup.
Stars
271
Forks
50
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 31, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/alibaba/EasyParallelLibrary"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.