KV Cache Optimization LLM Tools

Systems and frameworks for managing, compressing, and optimizing KV cache memory usage in LLM inference. Includes cache storage engines, virtual memory approaches, and persistence layers. Does NOT include general LLM caching proxies, semantic caching, or request/response deduplication tools.

There are 28 kv cache optimization tools tracked. 1 score above 50 (established tier). The highest-rated is ModelEngine-Group/unified-cache-management at 58/100 with 261 stars.

Get all 28 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kv-cache-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

58
Established
2 reloadware/reloadium

Hot Reloading and Profiling for Python

48
Emerging
3 alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with...

47
Emerging
4 October2001/Awesome-KV-Cache-Compression

๐Ÿ“ฐ Must-read papers on KV Cache Compression (constantly updating ๐Ÿค—).

47
Emerging
5 Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of ๐Ÿ“™Awesome LLM KV Cache Papers with Codes.

39
Emerging
6 xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

38
Emerging
7 sensoris/semcache

Semantic caching layer for your LLM applications. Reuse responses and reduce...

37
Emerging
8 dipampaul17/KVSplit

Run larger LLMs with longer contexts on Apple Silicon by using...

37
Emerging
9 samfurr/foveated_kv

Importance-adaptive mixed-precision KV cache compression for LLM inference...

37
Emerging
10 jjiantong/Awesome-KV-Cache-Optimization

[Survey] Towards Efficient Large Language Model Serving: A Survey on...

34
Emerging
11 TreeAI-Lab/Awesome-KV-Cache-Management

This repository serves as a comprehensive survey of LLM development,...

33
Emerging
12 TheToughCrane/nano-kvllm

This project aims to provide a high effective KV cache manage framework for...

31
Emerging
13 Naveenub/quantum-pulse

Extreme-density data vault for LLM training sets. MsgPack + Zstd-L22 +...

27
Experimental
14 helgklaizar/turboquant_mlx

Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon...

25
Experimental
15 jandhyala-dev/modelai-llama.cpp

Production fork of llama.cpp adding KV cache compaction via Attention Matching

25
Experimental
16 MSNP1381/cache-cool

๐ŸŒŸ Cache-cool: A fast, flexible LLM caching proxy that reduces latency and...

23
Experimental
17 RemizovDenis/turboquant

TurboQuant: KV-cache compression for faster and cheaper LLM inference.

23
Experimental
18 Siddhant-K-code/tokenvm

TokenVM is a high-performance runtime that treats LLM KV cache and...

22
Experimental
19 Jamalianpour/semantic-llm-cache

Semantic caching for LLM API responses in Spring Boot applications

22
Experimental
20 raymond-UI/llm-cache

LLM request/response caching with tiered TTL, time travel, and request...

22
Experimental
21 sentinelXVI/KeSSie

Enable efficient LLM inference by managing large token histories with a...

21
Experimental
22 rizwan199811/neurocache

Reduce LLM API costs and speed up responses by caching completions with...

21
Experimental
23 hupe1980/go-llmcache

๐Ÿง  Cache implementation for storing and retrieving results of language model...

20
Experimental
24 Janghyun1230/FastKVzip

Accurate and fast KV cache compression with a gating mechanism

18
Experimental
25 DreamSoul-AI/OBCache

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

15
Experimental
26 wenzyxx00/LMCache

Provide fast, memory-efficient caching for language models to improve...

14
Experimental
27 wln20/CSKV

[NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for...

14
Experimental
28 YUECHE77/SustainableKV

This is the official implementations for SustainableKV

13
Experimental