Kubernetes Llm Serving MLOps Tools
There are 17 kubernetes llm serving tools tracked. 4 score above 50 (established tier). The highest-rated is kubeflow/katib at 64/100 with 1,666 stars. 2 of the top 10 are actively maintained.
Get all 17 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=mlops&subcategory=kubernetes-llm-serving&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
kubeflow/katib
Automated Machine Learning on Kubernetes |
|
Established |
| 2 |
kubeai-project/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in... |
|
Established |
| 3 |
sgl-project/rbg
A workload for deploying LLM inference services on Kubernetes |
|
Established |
| 4 |
beam-cloud/beta9
Ultrafast serverless GPU inference, sandboxes, and background jobs |
|
Established |
| 5 |
ptimizeroracle/ondine
The LLM Dataset Engine — batch process millions of rows with 100+ providers.... |
|
Emerging |
| 6 |
scitix/arks
Arks is a cloud-native inference framework running on Kubernetes |
|
Emerging |
| 7 |
star-whale/starwhale
an MLOps/LLMOps platform |
|
Emerging |
| 8 |
defilantech/LLMKube
Kubernetes operator for GPU-accelerated LLM inference - air-gapped,... |
|
Emerging |
| 9 |
tensorchord/openmodelz
Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others) |
|
Emerging |
| 10 |
Climatik-Project/Climatik-Project
Carbon Limiting Auto Tuning for Kubernetes |
|
Emerging |
| 11 |
cloud-ai-ufcg/ai-engine
Workload migration recommendations engine. (CLI | API) |
|
Experimental |
| 12 |
eren23/crucible
Autonomous ML research on rental GPUs — LLM-driven hypothesis generation and... |
|
Experimental |
| 13 |
depadeto/detoserve
Open-source multi-cluster AI inference platform. Define functions once,... |
|
Experimental |
| 14 |
kube-gopher/magma
Kubernetes Operator for AI model lifecycle automation — bridging Volcano and Kthena. |
|
Experimental |
| 15 |
adityonugrohoid/gpu-autoscale-inference
Scale-to-zero GPU inference platform — LLM serving on Kubernetes with... |
|
Experimental |
| 16 |
sakthismarther/matrixhub
🔗 Accelerate AI inference with MatrixHub, a self-hosted model registry that... |
|
Experimental |
| 17 |
kodlan/LLM-zero-downtime-update
Kubernetes (Argo Rollouts) implementation for zero-downtime model updates... |
|
Experimental |