uw-mad-dash/shockwave

Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]

/ 100

Experimental

This project helps machine learning researchers and cluster administrators more efficiently manage and schedule deep learning training jobs on shared GPU clusters. It takes in configurations for various machine learning workloads and scheduling policies, and outputs optimized schedules that ensure fairness and efficient resource utilization, especially when training jobs dynamically adapt their resource needs. Researchers focused on large-scale machine learning, especially those developing or deploying adaptive training methods, would use this.

No commits in the last 6 months.

Use this if you are a machine learning researcher or cluster administrator managing shared GPU clusters for deep learning and need to improve the fairness and efficiency of job scheduling, especially for workloads that dynamically adjust their resource requirements.

Not ideal if you are looking for a simple, out-of-the-box solution for general-purpose computing clusters or for scheduling tasks that are not deep learning training workloads.

deep-learning-research GPU-cluster-management ML-resource-scheduling adaptive-training distributed-ML

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

qualcomm/ai-hub-models

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized...

petuum/adaptdl

Resource-adaptive cluster scheduler for deep learning training.

zszazi/Deep-learning-in-cloud

List of Deep Learning Cloud Providers

lincc-frameworks/hyrax

Hyrax - A low-code framework for rapid experimentation with ML & unsupervised discovery in astronomy

openhackathons-org/gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI

Explore ML Frameworks

All categories Trending ML Framework directory Insights