Kubernetes Llm Serving MLOps Tools

There are 17 kubernetes llm serving tools tracked. 4 score above 50 (established tier). The highest-rated is kubeflow/katib at 64/100 with 1,666 stars. 2 of the top 10 are actively maintained.

Get all 17 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=mlops&subcategory=kubernetes-llm-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 kubeflow/katib

Automated Machine Learning on Kubernetes

64
Established
2 kubeai-project/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in...

59
Established
3 sgl-project/rbg

A workload for deploying LLM inference services on Kubernetes

57
Established
4 beam-cloud/beta9

Ultrafast serverless GPU inference, sandboxes, and background jobs

55
Established
5 ptimizeroracle/ondine

The LLM Dataset Engine — batch process millions of rows with 100+ providers....

48
Emerging
6 scitix/arks

Arks is a cloud-native inference framework running on Kubernetes

47
Emerging
7 star-whale/starwhale

an MLOps/LLMOps platform

45
Emerging
8 defilantech/LLMKube

Kubernetes operator for GPU-accelerated LLM inference - air-gapped,...

42
Emerging
9 tensorchord/openmodelz

Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)

40
Emerging
10 Climatik-Project/Climatik-Project

Carbon Limiting Auto Tuning for Kubernetes

40
Emerging
11 cloud-ai-ufcg/ai-engine

Workload migration recommendations engine. (CLI | API)

29
Experimental
12 eren23/crucible

Autonomous ML research on rental GPUs — LLM-driven hypothesis generation and...

25
Experimental
13 depadeto/detoserve

Open-source multi-cluster AI inference platform. Define functions once,...

23
Experimental
14 kube-gopher/magma

Kubernetes Operator for AI model lifecycle automation — bridging Volcano and Kthena.

22
Experimental
15 adityonugrohoid/gpu-autoscale-inference

Scale-to-zero GPU inference platform — LLM serving on Kubernetes with...

14
Experimental
16 sakthismarther/matrixhub

🔗 Accelerate AI inference with MatrixHub, a self-hosted model registry that...

13
Experimental
17 kodlan/LLM-zero-downtime-update

Kubernetes (Argo Rollouts) implementation for zero-downtime model updates...

13
Experimental

Comparisons in this category