kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

/ 100

Established

This project helps large AI service providers manage and serve large language models (LLMs) more efficiently. It optimizes the process of delivering AI responses by providing a specialized system for handling the key-value cache (KVCache) — critical data that LLMs use for generating text. Organizations running advanced AI models like Kimi or similar large-scale services would use this to improve performance and reduce operational costs.

4,911 stars. Actively maintained with 111 commits in the last 30 days.

Use this if you are operating a large-scale AI service and need to optimize the performance and cost-efficiency of serving large language models.

Not ideal if you are developing small-scale AI applications or do not manage distributed, high-throughput LLM serving infrastructure.

AI-service-provision large-language-model-deployment AI-infrastructure-optimization high-performance-computing cloud-AI-operations

No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

4,911

Forks

600

Language

C++

License

Apache-2.0

Recent Releases

v0.3.10.post1 01 Apr 2026 v0.3.10 19 Mar 2026 v0.3.9 05 Feb 2026 v0.3.8.post1 09 Jan 2026 v0.3.8 26 Dec 2025

Related tools

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...

sophgo/tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.

BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Explore LLM Tools

All categories Trending LLM Tool directory Insights