Kubernetes LLM Serving LLM Tools

Tools and operators for deploying, scaling, and managing LLM inference workloads on Kubernetes clusters. Includes auto-scaling, GPU optimization, and production orchestration. Does NOT include general LLM SDKs, multi-provider abstractions, or non-Kubernetes deployment platforms.

There are 51 kubernetes llm serving tools tracked. 7 score above 50 (established tier). The highest-rated is AlexsJones/llmfit at 69/100 with 15,685 stars and 4,266 monthly downloads. 2 of the top 10 are actively maintained.

Get all 51 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kubernetes-llm-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	AlexsJones/llmfit Hundreds of models & providers. One command to find what runs on your hardware.	69	Established	15,685	Rust
2	victordibia/llmx An API for Chat Fine-Tuned Large Language Models (llm)	59	Established	92	Python
3	Chen-zexi/vllm-cli A command-line interface tool for serving LLM using vLLM.	57	Established	482	Python
4	InftyAI/llmaz ☸️ Easy, advanced inference platform for large language models on...	55	Established	293	Go
5	livehl/aimirror 🚀 200倍速！AI时代的下载神器 \| Docker/PyPI/HuggingFace/CRAN 全加速 \| 并行分片+智能缓存，让下载飞起来	52	Established	671	Python
6	TakatoHonda/sui-lang 粋 (Sui) - A programming language optimized for LLM code generation	50	Established	371	Python
7	matrixhub-ai/matrixhub An Open-source, self-hosted AI model hub with Hugging Face compatibility,...	50	Established	58	Go
8	ventz/easy-llms Easy "1-line" calling of all LLMs from OpenAI, MS Azure, AWS Bedrock, GCP...	48	Emerging	53	Python
9	r2d4/openlm OpenAI-compatible Python client that can call any LLM	47	Emerging	372	Python
10	llmariner/llmariner Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.	47	Emerging	94	Go
11	cloud-apim/otoroshi-llm-extension Connect, setup, secure and seamlessly manage LLM models using an...	47	Emerging	15	Scala
12	edwardcapriolo/deliverance A Java based inference engine	45	Emerging	11	Java
13	kalavai-net/kalavai-client Aggregates compute from spare GPU capacity	44	Emerging	197	Python
14	sozercan/kubectl-ai ✨ Kubectl plugin to create manifests with LLMs	43	Emerging	1,196	Go
15	chigwell/llm7.io LLM7.io offers a single API gateway that connects you to a wide array of...	42	Emerging	136	TypeScript
16	chenhunghan/ialacol 🪶 Lightweight OpenAI drop-in replacement for Kubernetes	41	Emerging	147	Python
17	EM-GeekLab/LLMOne Enterprise-grade LLM automated deployment tool that makes AI servers truly...	41	Emerging	87	TypeScript
18	AntSeed/antseed AntSeed P2P AI Services Network	40	Emerging	16	TypeScript
19	hkalbertkim/KORA An Inference Operating System that reduces unnecessary LLM calls by...	38	Emerging	4	Python
20	friendliai/friendli-client [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI	37	Emerging	50	Python
21	jadnohra/hf-providers Compare API providers, local GPUs, and cloud for any model	36	Emerging	3	JavaScript
22	paolobietolini/gtm-api-for-llms This repository contains a structured, machine-readable reference of the...	36	Emerging	24	—
23	sozercan/k8s-distributed-inference 🦄 Distributed Inference on Kubernetes with DRA and MIG	34	Emerging	3	Shell
24	windsnow1025/LLM-Bridge A Python library that wraps multiple LLM providers into a consistent API...	33	Emerging	6	Python
25	profullstack/infernet-protocol Infernet: A Peer-to-Peer Distributed GPU Inference Protocol	32	Emerging	22	JavaScript
26	hwclass/docktor AI-Native Autoscaler for Docker Compose built with cagent + MCP + Model Runner.	31	Emerging	5	Go
27	bsilverthorn/vernac Plain language programming language 📖	30	Emerging	6	Python
28	cloudglue/cloudglue-api-spec Official OpenAPI specification for Clouglue API	30	Emerging	2	—
29	sanjbh/kube-scaling-agent Kubernetes operator that uses LLM reasoning to autoscale deployments — reads...	29	Experimental	3	Go
30	ngstcf/llmbase Unified API for multiple LLM providers. Use as a Python library or HTTP API server.	27	Experimental	1	Python
31	TrentPierce/Shard Shard is a speculative inference accelerator that reduces GPU usage by...	25	Experimental	1	C++
32	inferLean/inferlean-project the copilot for LLM inference optimization	25	Experimental	3	Go
33	inferscale/inferscale A fully automated MLOps platform built to democratize AI/ML infrastructure	25	Experimental	1	—
34	cloud-apim/otoroshi-llm-extension-serverless-example An example project to use Otoroshi LLM Extension in Cloud APIM Serverless	22	Experimental	1	—
35	mycellm/mycellm Distributed LLM inference across heterogeneous hardware. Pool GPUs into a...	22	Experimental	—	Python
36	umoja-compute/umoja-compute Free OpenAI-compatible infrastructure for running open LLMs on distributed...	22	Experimental	1	Jupyter Notebook
37	David-Martel/PC-AI Local LLM-powered PC diagnostics and optimization framework for Windows	22	Experimental	—	PowerShell
38	saurabhknp/air-gapped Enable offline Kubernetes ops with a local AI agent that runs fully...	22	Experimental	1	Shell
39	AdieLaine/Model-Sliding Enables the application to transition seamlessly between different OpenAI...	20	Experimental	5	Python
40	kenahrens/ai-testing Running AI Models in Kubernetes	19	Experimental	3	JavaScript
41	failfa-st/simplif-ai A pseudolanguage to describe code for LLMs	19	Experimental	4	—
42	deepakdeo/python-llm-playbook A unified Python interface for multiple LLM providers (OpenAI, Anthropic,...	17	Experimental	—	Jupyter Notebook
43	localllm-advisor/localllm-advisor The free tool to find the best LLM for your hardware, or the best hardware...	17	Experimental	1	TypeScript
44	debarun1234/llm-model-eligibility-checker A beautiful desktop application that analyzes your computer's specifications...	15	Experimental	—	JavaScript
45	Ptchwir3/Rookery Turn any Kubernetes Cluster into a private LLM endpoint. One Helm command...	15	Experimental	2	Dockerfile
46	boufia/vllm-lan-inference 🚀 Deliver OpenAI-compatible LLM inference on your LAN with vLLM and gateway...	15	Experimental	1	—
47	gowshikram/unified-llm-engine ⚡ Streamline your AI integrations with a multi-provider LLM engine,...	14	Experimental	—	TypeScript
48	ait-testbed/playbookgen A CLI tool for generating AttackMate playbooks using LLMs (currently...	14	Experimental	1	Python
49	ai-art-dev99/vLLM-efficient-serving-stack Production-grade vLLM serving with an OpenAI-compatible API, per-request...	13	Experimental	—	Python
50	bonham000/kuzco-inference-client Kuzco Inference Client ✨	11	Experimental	3	TypeScript
51	wvhulle/demistify Fine-tune an LLM like Llama 3.2 on obscure languages or codebases with Unsloth	10	Experimental	1	Rust