heyfey/vodascheduler
GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)
This is a GPU scheduler designed for deep learning engineers or MLOps teams managing large-scale, distributed deep learning training. It takes your elastic deep learning workloads and intelligently allocates GPU resources across a Kubernetes cluster. The output is faster training times and more efficient use of your costly GPU infrastructure, especially when using dynamic cloud instances.
No commits in the last 6 months.
Use this if you are running many deep learning training jobs on a shared GPU cluster and need to maximize resource utilization and throughput, especially with fluctuating resources like spot instances.
Not ideal if you are running single-GPU training jobs or do not have a Kubernetes cluster configured for GPU scheduling.
Stars
34
Forks
2
Language
Go
License
Apache-2.0
Category
Last pushed
Nov 11, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/heyfey/vodascheduler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
skypilot-org/skypilot
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage...
dstackai/dstack
dstack is an open-source control plane for running development, training, and inference jobs on...
ray-project/kuberay
A toolkit to run Ray applications on Kubernetes
kubeflow/kale
Kubeflow’s superfood for Data Scientists
volcano-sh/volcano
A Cloud Native Batch System (Project under CNCF)