xcena-dev/maru
High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference
Maru helps large language model (LLM) inference operations run faster and more efficiently by changing how KV caches are shared. Instead of copying data between LLM instances, Maru allows them to directly access a shared memory pool on CXL-enabled hardware. This reduces latency and improves hardware utilization for engineers managing and scaling LLM inference.
Use this if you are scaling LLM inference and need to reduce memory duplication, latency, and power consumption by enabling multiple LLM instances to share KV caches directly on CXL hardware.
Not ideal if your LLM inference environment does not use CXL-enabled hardware or if you do not face significant performance bottlenecks from KV cache sharing.
Stars
38
Forks
4
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/xcena-dev/maru"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.