Kubernetes LLM Serving LLM Tools

Tools and operators for deploying, scaling, and managing LLM inference workloads on Kubernetes clusters. Includes auto-scaling, GPU optimization, and production orchestration. Does NOT include general LLM SDKs, multi-provider abstractions, or non-Kubernetes deployment platforms.

There are 51 kubernetes llm serving tools tracked. 7 score above 50 (established tier). The highest-rated is AlexsJones/llmfit at 69/100 with 15,685 stars and 4,266 monthly downloads. 2 of the top 10 are actively maintained.

Get all 51 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kubernetes-llm-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 AlexsJones/llmfit

Hundreds of models & providers. One command to find what runs on your hardware.

69
Established
2 victordibia/llmx

An API for Chat Fine-Tuned Large Language Models (llm)

59
Established
3 Chen-zexi/vllm-cli

A command-line interface tool for serving LLM using vLLM.

57
Established
4 InftyAI/llmaz

☸️ Easy, advanced inference platform for large language models on...

55
Established
5 livehl/aimirror

🚀 200倍速!AI时代的下载神器 | Docker/PyPI/HuggingFace/CRAN 全加速 | 并行分片+智能缓存,让下载飞起来

52
Established
6 TakatoHonda/sui-lang

粋 (Sui) - A programming language optimized for LLM code generation

50
Established
7 matrixhub-ai/matrixhub

An Open-source, self-hosted AI model hub with Hugging Face compatibility,...

50
Established
8 ventz/easy-llms

Easy "1-line" calling of all LLMs from OpenAI, MS Azure, AWS Bedrock, GCP...

48
Emerging
9 r2d4/openlm

OpenAI-compatible Python client that can call any LLM

47
Emerging
10 llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

47
Emerging
11 cloud-apim/otoroshi-llm-extension

Connect, setup, secure and seamlessly manage LLM models using an...

47
Emerging
12 edwardcapriolo/deliverance

A Java based inference engine

45
Emerging
13 kalavai-net/kalavai-client

Aggregates compute from spare GPU capacity

44
Emerging
14 sozercan/kubectl-ai

✨ Kubectl plugin to create manifests with LLMs

43
Emerging
15 chigwell/llm7.io

LLM7.io offers a single API gateway that connects you to a wide array of...

42
Emerging
16 chenhunghan/ialacol

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

41
Emerging
17 EM-GeekLab/LLMOne

Enterprise-grade LLM automated deployment tool that makes AI servers truly...

41
Emerging
18 AntSeed/antseed

AntSeed P2P AI Services Network

40
Emerging
19 hkalbertkim/KORA

An Inference Operating System that reduces unnecessary LLM calls by...

38
Emerging
20 friendliai/friendli-client

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

37
Emerging
21 jadnohra/hf-providers

Compare API providers, local GPUs, and cloud for any model

36
Emerging
22 paolobietolini/gtm-api-for-llms

This repository contains a structured, machine-readable reference of the...

36
Emerging
23 sozercan/k8s-distributed-inference

🦄 Distributed Inference on Kubernetes with DRA and MIG

34
Emerging
24 windsnow1025/LLM-Bridge

A Python library that wraps multiple LLM providers into a consistent API...

33
Emerging
25 profullstack/infernet-protocol

Infernet: A Peer-to-Peer Distributed GPU Inference Protocol

32
Emerging
26 hwclass/docktor

AI-Native Autoscaler for Docker Compose built with cagent + MCP + Model Runner.

31
Emerging
27 bsilverthorn/vernac

Plain language programming language 📖

30
Emerging
28 cloudglue/cloudglue-api-spec

Official OpenAPI specification for Clouglue API

30
Emerging
29 sanjbh/kube-scaling-agent

Kubernetes operator that uses LLM reasoning to autoscale deployments — reads...

29
Experimental
30 ngstcf/llmbase

Unified API for multiple LLM providers. Use as a Python library or HTTP API server.

27
Experimental
31 TrentPierce/Shard

Shard is a speculative inference accelerator that reduces GPU usage by...

25
Experimental
32 inferLean/inferlean-project

the copilot for LLM inference optimization

25
Experimental
33 inferscale/inferscale

A fully automated MLOps platform built to democratize AI/ML infrastructure

25
Experimental
34 cloud-apim/otoroshi-llm-extension-serverless-example

An example project to use Otoroshi LLM Extension in Cloud APIM Serverless

22
Experimental
35 mycellm/mycellm

Distributed LLM inference across heterogeneous hardware. Pool GPUs into a...

22
Experimental
36 umoja-compute/umoja-compute

Free OpenAI-compatible infrastructure for running open LLMs on distributed...

22
Experimental
37 David-Martel/PC-AI

Local LLM-powered PC diagnostics and optimization framework for Windows

22
Experimental
38 saurabhknp/air-gapped

Enable offline Kubernetes ops with a local AI agent that runs fully...

22
Experimental
39 AdieLaine/Model-Sliding

Enables the application to transition seamlessly between different OpenAI...

20
Experimental
40 kenahrens/ai-testing

Running AI Models in Kubernetes

19
Experimental
41 failfa-st/simplif-ai

A pseudolanguage to describe code for LLMs

19
Experimental
42 deepakdeo/python-llm-playbook

A unified Python interface for multiple LLM providers (OpenAI, Anthropic,...

17
Experimental
43 localllm-advisor/localllm-advisor

The free tool to find the best LLM for your hardware, or the best hardware...

17
Experimental
44 debarun1234/llm-model-eligibility-checker

A beautiful desktop application that analyzes your computer's specifications...

15
Experimental
45 Ptchwir3/Rookery

Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command...

15
Experimental
46 boufia/vllm-lan-inference

🚀 Deliver OpenAI-compatible LLM inference on your LAN with vLLM and gateway...

15
Experimental
47 gowshikram/unified-llm-engine

⚡ Streamline your AI integrations with a multi-provider LLM engine,...

14
Experimental
48 ait-testbed/playbookgen

A CLI tool for generating AttackMate playbooks using LLMs (currently...

14
Experimental
49 ai-art-dev99/vLLM-efficient-serving-stack

Production-grade vLLM serving with an OpenAI-compatible API, per-request...

13
Experimental
50 bonham000/kuzco-inference-client

Kuzco Inference Client ✨

11
Experimental
51 wvhulle/demistify

Fine-tune an LLM like Llama 3.2 on obscure languages or codebases with Unsloth

10
Experimental