xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

38
/ 100
Emerging

Maru helps large language model (LLM) inference operations run faster and more efficiently by changing how KV caches are shared. Instead of copying data between LLM instances, Maru allows them to directly access a shared memory pool on CXL-enabled hardware. This reduces latency and improves hardware utilization for engineers managing and scaling LLM inference.

Use this if you are scaling LLM inference and need to reduce memory duplication, latency, and power consumption by enabling multiple LLM instances to share KV caches directly on CXL hardware.

Not ideal if your LLM inference environment does not use CXL-enabled hardware or if you do not face significant performance bottlenecks from KV cache sharing.

LLM-inference data-center-optimization memory-management AI-infrastructure compute-scaling
No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 11 / 25
Community 10 / 25

How are scores calculated?

Stars

38

Forks

4

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/xcena-dev/maru"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.