alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSim), and more.
This system helps AI developers and MLOps engineers optimize the performance and cost of Large Language Model (LLM) inference. It takes LLM inference requests and configuration data as input, then outputs managed key-value cache data and performance simulations. The end users are typically professionals responsible for deploying and operating LLMs in production environments.
Use this if you are running Large Language Models and need to efficiently manage their key-value caches to reduce costs and accelerate inference.
Not ideal if you are looking for a general-purpose caching solution unrelated to LLM inference or if you don't manage large-scale LLM deployments.
Stars
96
Forks
13
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/alibaba/tair-kvcache"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
xcena-dev/maru
High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference