uw-mad-dash/shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
This project helps machine learning researchers and cluster administrators more efficiently manage and schedule deep learning training jobs on shared GPU clusters. It takes in configurations for various machine learning workloads and scheduling policies, and outputs optimized schedules that ensure fairness and efficient resource utilization, especially when training jobs dynamically adapt their resource needs. Researchers focused on large-scale machine learning, especially those developing or deploying adaptive training methods, would use this.
No commits in the last 6 months.
Use this if you are a machine learning researcher or cluster administrator managing shared GPU clusters for deep learning and need to improve the fairness and efficiency of job scheduling, especially for workloads that dynamically adjust their resource requirements.
Not ideal if you are looking for a simple, out-of-the-box solution for general-purpose computing clusters or for scheduling tasks that are not deep learning training workloads.
Stars
47
Forks
1
Language
Python
License
MIT
Category
Last pushed
Nov 24, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/uw-mad-dash/shockwave"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
qualcomm/ai-hub-models
Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized...
petuum/adaptdl
Resource-adaptive cluster scheduler for deep learning training.
zszazi/Deep-learning-in-cloud
List of Deep Learning Cloud Providers
lincc-frameworks/hyrax
Hyrax - A low-code framework for rapid experimentation with ML & unsupervised discovery in astronomy
openhackathons-org/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI