hkproj/pytorch-transformer-distributed

Distributed training (multi-node) of a Transformer model

/ 100

Emerging

This project helps machine learning engineers train large transformer models more efficiently by distributing the computational load across multiple GPU-enabled machines. You provide your transformer model code and training data, and it outputs a trained model faster than single-machine setups. This is for machine learning engineers and researchers working with substantial AI models.

No commits in the last 6 months.

Use this if you need to accelerate the training of a large transformer model that is too computationally intensive for a single GPU machine.

Not ideal if you are training smaller models or do not have access to a multi-node, multi-GPU cloud computing environment.

deep-learning model-training natural-language-processing distributed-computing AI-research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 21 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference...

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Explore ML Frameworks

All categories Trending ML Framework directory Insights