defilantech/LLMKube
Kubernetes operator for GPU-accelerated LLM inference - air-gapped, edge-native, production-ready
LLMKube helps organizations deploy large language models (LLMs) on their own computing infrastructure, whether for privacy, cost control, or air-gapped compliance. It takes your chosen LLM and hardware specifications, then manages the entire deployment process, making the model available via a standard API. This is ideal for infrastructure engineers, MLOps teams, or application developers who need to integrate LLM inference into their products while maintaining full control over their data and hardware.
Use this if you need to run LLMs on your own servers or Macs, require advanced GPU scheduling and monitoring, or want to create a mixed environment using both NVIDIA and Apple Silicon GPUs.
Not ideal if you only need to run LLMs on a single local machine without Kubernetes, or if you prefer a fully managed cloud service for LLM inference.
Stars
29
Forks
4
Language
Go
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/defilantech/LLMKube"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
kubeflow/katib
Automated Machine Learning on Kubernetes
kubeai-project/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports...
sgl-project/rbg
A workload for deploying LLM inference services on Kubernetes
beam-cloud/beta9
Ultrafast serverless GPU inference, sandboxes, and background jobs
ptimizeroracle/ondine
The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...