xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

/ 100

Emerging

Maru helps large language model (LLM) inference operations run faster and more efficiently by changing how KV caches are shared. Instead of copying data between LLM instances, Maru allows them to directly access a shared memory pool on CXL-enabled hardware. This reduces latency and improves hardware utilization for engineers managing and scaling LLM inference.

Use this if you are scaling LLM inference and need to reduce memory duplication, latency, and power consumption by enabling multiple LLM instances to share KV caches directly on CXL hardware.

Not ideal if your LLM inference environment does not use CXL-enabled hardware or if you do not face significant performance bottlenecks from KV cache sharing.

LLM-inference data-center-optimization memory-management AI-infrastructure compute-scaling

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 11 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Explore LLM Tools

All categories Trending LLM Tool directory Insights