heyfey/vodascheduler

GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)

29
/ 100
Experimental

This is a GPU scheduler designed for deep learning engineers or MLOps teams managing large-scale, distributed deep learning training. It takes your elastic deep learning workloads and intelligently allocates GPU resources across a Kubernetes cluster. The output is faster training times and more efficient use of your costly GPU infrastructure, especially when using dynamic cloud instances.

No commits in the last 6 months.

Use this if you are running many deep learning training jobs on a shared GPU cluster and need to maximize resource utilization and throughput, especially with fluctuating resources like spot instances.

Not ideal if you are running single-GPU training jobs or do not have a Kubernetes cluster configured for GPU scheduling.

deep-learning-operations gpu-resource-management distributed-training mlops cloud-infrastructure
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

34

Forks

2

Language

Go

License

Apache-2.0

Last pushed

Nov 11, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mlops/heyfey/vodascheduler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.