sgl-project/rbg
A workload for deploying LLM inference services on Kubernetes
This Kubernetes API helps with deploying and managing complex, multi-component AI inference systems, especially large language models (LLMs). It takes your specifications for different parts of an LLM service (like prefill and decode roles) and outputs a stable, coordinated, and high-performance deployment on your Kubernetes cluster. It's designed for operations engineers or MLOps teams managing production AI services.
187 stars.
Use this if you need to reliably deploy and operate stateful, performance-sensitive, and multi-role LLM inference services on Kubernetes, ensuring proper coordination and resource utilization.
Not ideal if you are deploying simple, single-component applications or if your AI inference workloads do not require intricate multi-role coordination and topology awareness.
Stars
187
Forks
47
Language
Go
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/sgl-project/rbg"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
kubeflow/katib
Automated Machine Learning on Kubernetes
kubeai-project/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports...
beam-cloud/beta9
Ultrafast serverless GPU inference, sandboxes, and background jobs
ptimizeroracle/ondine
The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...
scitix/arks
Arks is a cloud-native inference framework running on Kubernetes