Youhe-Jiang/IJCAI2023-OptimalShardedDataParallel

[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any interests, please visit/star/fork https://github.com/Youhe-Jiang/OptimalShardedDataParallel

/ 100

Emerging

This project helps machine learning engineers and researchers efficiently train very large deep learning models like GPT or OPT on multiple GPUs. It takes your existing PyTorch model and automatically optimizes how data and model components are distributed across your hardware, allowing you to train bigger models faster and with less memory overhead than standard methods. The output is a more efficient training process for large-scale AI models.

No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher struggling to train large deep learning models due to memory limitations or slow training times on multi-GPU setups.

Not ideal if you are working with small models that don't require distributed training or if you are not using PyTorch.

deep-learning-training large-language-models distributed-ml gpu-optimization model-scalability

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference...

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Explore ML Frameworks

All categories Trending ML Framework directory Insights