Kubernetes Llm Serving MLOps Tools

There are 17 kubernetes llm serving tools tracked. 4 score above 50 (established tier). The highest-rated is kubeflow/katib at 64/100 with 1,666 stars. 2 of the top 10 are actively maintained.

Get all 17 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=mlops&subcategory=kubernetes-llm-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	kubeflow/katib Automated Machine Learning on Kubernetes	64	Established	1,666	Python
2	kubeai-project/kubeai AI Inference Operator for Kubernetes. The easiest way to serve ML models in...	59	Established	1,161	Go
3	sgl-project/rbg A workload for deploying LLM inference services on Kubernetes	57	Established	187	Go
4	beam-cloud/beta9 Ultrafast serverless GPU inference, sandboxes, and background jobs	55	Established	1,602	Go
5	ptimizeroracle/ondine The LLM Dataset Engine — batch process millions of rows with 100+ providers....	48	Emerging	4	Python
6	scitix/arks Arks is a cloud-native inference framework running on Kubernetes	47	Emerging	46	Go
7	star-whale/starwhale an MLOps/LLMOps platform	45	Emerging	237	Java
8	defilantech/LLMKube Kubernetes operator for GPU-accelerated LLM inference - air-gapped,...	42	Emerging	29	Go
9	tensorchord/openmodelz Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)	40	Emerging	281	Go
10	Climatik-Project/Climatik-Project Carbon Limiting Auto Tuning for Kubernetes	40	Emerging	37	Go
11	cloud-ai-ufcg/ai-engine Workload migration recommendations engine. (CLI \| API)	29	Experimental	1	Python
12	eren23/crucible Autonomous ML research on rental GPUs — LLM-driven hypothesis generation and...	25	Experimental	4	Python
13	depadeto/detoserve Open-source multi-cluster AI inference platform. Define functions once,...	23	Experimental	—	Go
14	kube-gopher/magma Kubernetes Operator for AI model lifecycle automation — bridging Volcano and Kthena.	22	Experimental	—	—
15	adityonugrohoid/gpu-autoscale-inference Scale-to-zero GPU inference platform — LLM serving on Kubernetes with...	14	Experimental	—	Shell
16	sakthismarther/matrixhub 🔗 Accelerate AI inference with MatrixHub, a self-hosted model registry that...	13	Experimental	—	—
17	kodlan/LLM-zero-downtime-update Kubernetes (Argo Rollouts) implementation for zero-downtime model updates...	13	Experimental	—	Shell

Comparisons in this category

kubeai and LLMKube (59 vs 42)