alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSim), and more.

/ 100

Emerging

This system helps AI developers and MLOps engineers optimize the performance and cost of Large Language Model (LLM) inference. It takes LLM inference requests and configuration data as input, then outputs managed key-value cache data and performance simulations. The end users are typically professionals responsible for deploying and operating LLMs in production environments.

Use this if you are running Large Language Models and need to efficiently manage their key-value caches to reduce costs and accelerate inference.

Not ideal if you are looking for a general-purpose caching solution unrelated to LLM inference or if you don't manage large-scale LLM deployments.

LLM deployment AI infrastructure model inference optimization MLOps cloud resource management

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 13 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

C++

License

Apache-2.0

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

Explore LLM Tools

All categories Trending LLM Tool directory Insights