defilantech/LLMKube

Kubernetes operator for GPU-accelerated LLM inference - air-gapped, edge-native, production-ready

/ 100

Emerging

LLMKube helps organizations deploy large language models (LLMs) on their own computing infrastructure, whether for privacy, cost control, or air-gapped compliance. It takes your chosen LLM and hardware specifications, then manages the entire deployment process, making the model available via a standard API. This is ideal for infrastructure engineers, MLOps teams, or application developers who need to integrate LLM inference into their products while maintaining full control over their data and hardware.

Use this if you need to run LLMs on your own servers or Macs, require advanced GPU scheduling and monitoring, or want to create a mixed environment using both NVIDIA and Apple Silicon GPUs.

Not ideal if you only need to run LLMs on a single local machine without Kubernetes, or if you prefer a fully managed cloud service for LLM inference.

MLOps On-premise AI GPU orchestration Data privacy Edge AI

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 13 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

License

Apache-2.0

Compare

LLMKube and kubeai

Higher-rated alternatives

kubeflow/katib

Automated Machine Learning on Kubernetes

kubeai-project/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports...

sgl-project/rbg

A workload for deploying LLM inference services on Kubernetes

beam-cloud/beta9

Ultrafast serverless GPU inference, sandboxes, and background jobs

ptimizeroracle/ondine

The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...

Explore MLOps Tools

All categories Trending MLOps directory Insights