sgl-project/rbg

A workload for deploying LLM inference services on Kubernetes

/ 100

Established

This Kubernetes API helps with deploying and managing complex, multi-component AI inference systems, especially large language models (LLMs). It takes your specifications for different parts of an LLM service (like prefill and decode roles) and outputs a stable, coordinated, and high-performance deployment on your Kubernetes cluster. It's designed for operations engineers or MLOps teams managing production AI services.

187 stars.

Use this if you need to reliably deploy and operate stateful, performance-sensitive, and multi-role LLM inference services on Kubernetes, ensuring proper coordination and resource utilization.

Not ideal if you are deploying simple, single-component applications or if your AI inference workloads do not require intricate multi-role coordination and topology awareness.

MLOps Kubernetes-operations AI-inference-deployment LLM-serving distributed-systems-management

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 22 / 25

How are scores calculated?

Stars

187

Forks

Language

License

Apache-2.0

Related tools

kubeflow/katib

Automated Machine Learning on Kubernetes

kubeai-project/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports...

beam-cloud/beta9

Ultrafast serverless GPU inference, sandboxes, and background jobs

ptimizeroracle/ondine

The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...

scitix/arks

Arks is a cloud-native inference framework running on Kubernetes

Explore MLOps Tools

All categories Trending MLOps directory Insights