gsyang33/Driple
🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems
This project helps machine learning infrastructure engineers and researchers estimate the resource consumption of distributed deep learning systems before deployment. It takes computational graph representations of deep learning models and system configuration details as input. It then predicts key resource metrics like GPU utilization, GPU memory usage, and network throughput, helping optimize system design and resource allocation for training workloads.
No commits in the last 6 months.
Use this if you need to predict how much GPU, memory, and network resources a distributed deep learning model will consume given specific hardware and software configurations.
Not ideal if you are looking for a tool to optimize the deep learning model itself or to monitor real-time resource usage of already running systems.
Stars
32
Forks
26
Language
Python
License
—
Category
Last pushed
Feb 06, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/gsyang33/Driple"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.