defilantech/LLMKube

Kubernetes operator for GPU-accelerated LLM inference - air-gapped, edge-native, production-ready

42
/ 100
Emerging

LLMKube helps organizations deploy large language models (LLMs) on their own computing infrastructure, whether for privacy, cost control, or air-gapped compliance. It takes your chosen LLM and hardware specifications, then manages the entire deployment process, making the model available via a standard API. This is ideal for infrastructure engineers, MLOps teams, or application developers who need to integrate LLM inference into their products while maintaining full control over their data and hardware.

Use this if you need to run LLMs on your own servers or Macs, require advanced GPU scheduling and monitoring, or want to create a mixed environment using both NVIDIA and Apple Silicon GPUs.

Not ideal if you only need to run LLMs on a single local machine without Kubernetes, or if you prefer a fully managed cloud service for LLM inference.

MLOps On-premise AI GPU orchestration Data privacy Edge AI
No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 13 / 25
Community 12 / 25

How are scores calculated?

Stars

29

Forks

4

Language

Go

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mlops/defilantech/LLMKube"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.