heyfey/vodascheduler

GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)

/ 100

Experimental

This is a GPU scheduler designed for deep learning engineers or MLOps teams managing large-scale, distributed deep learning training. It takes your elastic deep learning workloads and intelligently allocates GPU resources across a Kubernetes cluster. The output is faster training times and more efficient use of your costly GPU infrastructure, especially when using dynamic cloud instances.

No commits in the last 6 months.

Use this if you are running many deep learning training jobs on a shared GPU cluster and need to maximize resource utilization and throughput, especially with fluctuating resources like spot instances.

Not ideal if you are running single-GPU training jobs or do not have a Kubernetes cluster configured for GPU scheduling.

deep-learning-operations gpu-resource-management distributed-training mlops cloud-infrastructure

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

License

Apache-2.0

Higher-rated alternatives

skypilot-org/skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage...

dstackai/dstack

dstack is an open-source control plane for running development, training, and inference jobs on...

ray-project/kuberay

A toolkit to run Ray applications on Kubernetes

kubeflow/kale

Kubeflow’s superfood for Data Scientists

volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

Explore MLOps Tools

All categories Trending MLOps directory Insights