Hsword/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, please visit/star/fork https://github.com/PKU-DAIR/Hetu

/ 100

Emerging

This system helps machine learning engineers and researchers efficiently train very large deep learning models, particularly those with trillions of parameters. It takes your raw data and a defined deep learning model, then outputs a highly optimized, trained model ready for deployment. This is for professionals working with massive datasets and complex models, aiming to achieve faster training times and better scalability.

124 stars. No commits in the last 6 months.

Use this if you need to train deep learning models that are so large they require distributed computing across many GPUs or CPU nodes, and you want to achieve significant speedups compared to traditional frameworks.

Not ideal if you are working with smaller models that can be trained efficiently on a single machine or with standard deep learning libraries without needing advanced distributed optimization.

deep-learning-engineering large-scale-ml model-training neural-networks high-performance-computing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

124

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference...

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Explore ML Frameworks

All categories Trending ML Framework directory Insights