uw-mad-dash/shockwave

Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]

27
/ 100
Experimental

This project helps machine learning researchers and cluster administrators more efficiently manage and schedule deep learning training jobs on shared GPU clusters. It takes in configurations for various machine learning workloads and scheduling policies, and outputs optimized schedules that ensure fairness and efficient resource utilization, especially when training jobs dynamically adapt their resource needs. Researchers focused on large-scale machine learning, especially those developing or deploying adaptive training methods, would use this.

No commits in the last 6 months.

Use this if you are a machine learning researcher or cluster administrator managing shared GPU clusters for deep learning and need to improve the fairness and efficiency of job scheduling, especially for workloads that dynamically adjust their resource requirements.

Not ideal if you are looking for a simple, out-of-the-box solution for general-purpose computing clusters or for scheduling tasks that are not deep learning training workloads.

deep-learning-research GPU-cluster-management ML-resource-scheduling adaptive-training distributed-ML
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

47

Forks

1

Language

Python

License

MIT

Last pushed

Nov 24, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/uw-mad-dash/shockwave"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.