kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
This project helps large AI service providers manage and serve large language models (LLMs) more efficiently. It optimizes the process of delivering AI responses by providing a specialized system for handling the key-value cache (KVCache) — critical data that LLMs use for generating text. Organizations running advanced AI models like Kimi or similar large-scale services would use this to improve performance and reduce operational costs.
4,911 stars. Actively maintained with 111 commits in the last 30 days.
Use this if you are operating a large-scale AI service and need to optimize the performance and cost-efficiency of serving large language models.
Not ideal if you are developing small-scale AI applications or do not manage distributed, high-throughput LLM serving infrastructure.
Stars
4,911
Forks
600
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
111
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/kvcache-ai/Mooncake"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related tools
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
SemiAnalysisAI/InferenceX
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...
uccl-project/uccl
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.