gsyang33/Driple

🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems

/ 100

Emerging

This project helps machine learning infrastructure engineers and researchers estimate the resource consumption of distributed deep learning systems before deployment. It takes computational graph representations of deep learning models and system configuration details as input. It then predicts key resource metrics like GPU utilization, GPU memory usage, and network throughput, helping optimize system design and resource allocation for training workloads.

No commits in the last 6 months.

Use this if you need to predict how much GPU, memory, and network resources a distributed deep learning model will consume given specific hardware and software configurations.

Not ideal if you are looking for a tool to optimize the deep learning model itself or to monitor real-time resource usage of already running systems.

deep-learning-operations MLOps resource-management performance-engineering distributed-training

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 20 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference...

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Explore ML Frameworks

All categories Trending ML Framework directory Insights