KV Cache Optimization LLM Tools

Systems and frameworks for managing, compressing, and optimizing KV cache memory usage in LLM inference. Includes cache storage engines, virtual memory approaches, and persistence layers. Does NOT include general LLM caching proxies, semantic caching, or request/response deduplication tools.

There are 28 kv cache optimization tools tracked. 1 score above 50 (established tier). The highest-rated is ModelEngine-Group/unified-cache-management at 58/100 with 261 stars.

Get all 28 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kv-cache-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	ModelEngine-Group/unified-cache-management Persist and reuse KV Cache to speedup your LLM.	58	Established	261	Python
2	reloadware/reloadium Hot Reloading and Profiling for Python	48	Emerging	2,999	Python
3	alibaba/tair-kvcache Alibaba Cloud's high-performance KVCache system for LLM inference, with...	47	Emerging	96	C++
4	October2001/Awesome-KV-Cache-Compression 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).	47	Emerging	668	—
5	Zefan-Cai/Awesome-LLM-KV-Cache Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.	39	Emerging	417	—
6	xcena-dev/maru High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference	38	Emerging	38	Python
7	sensoris/semcache Semantic caching layer for your LLM applications. Reuse responses and reduce...	37	Emerging	94	Rust
8	dipampaul17/KVSplit Run larger LLMs with longer contexts on Apple Silicon by using...	37	Emerging	362	Python
9	samfurr/foveated_kv Importance-adaptive mixed-precision KV cache compression for LLM inference...	37	Emerging	3	Python
10	jjiantong/Awesome-KV-Cache-Optimization [Survey] Towards Efficient Large Language Model Serving: A Survey on...	34	Emerging	310	Python
11	TreeAI-Lab/Awesome-KV-Cache-Management This repository serves as a comprehensive survey of LLM development,...	33	Emerging	291	—
12	TheToughCrane/nano-kvllm This project aims to provide a high effective KV cache manage framework for...	31	Emerging	35	Python
13	Naveenub/quantum-pulse Extreme-density data vault for LLM training sets. MsgPack + Zstd-L22 +...	27	Experimental	3	Python
14	helgklaizar/turboquant_mlx Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon...	25	Experimental	15	Python
15	jandhyala-dev/modelai-llama.cpp Production fork of llama.cpp adding KV cache compaction via Attention Matching	25	Experimental	1	C++
16	MSNP1381/cache-cool 🌟 Cache-cool: A fast, flexible LLM caching proxy that reduces latency and...	23	Experimental	29	Python
17	RemizovDenis/turboquant TurboQuant: KV-cache compression for faster and cheaper LLM inference.	23	Experimental	1	HTML
18	Siddhant-K-code/tokenvm TokenVM is a high-performance runtime that treats LLM KV cache and...	22	Experimental	9	Go
19	Jamalianpour/semantic-llm-cache Semantic caching for LLM API responses in Spring Boot applications	22	Experimental	3	Java
20	raymond-UI/llm-cache LLM request/response caching with tiered TTL, time travel, and request...	22	Experimental	1	TypeScript
21	sentinelXVI/KeSSie Enable efficient LLM inference by managing large token histories with a...	21	Experimental	—	Python
22	rizwan199811/neurocache Reduce LLM API costs and speed up responses by caching completions with...	21	Experimental	—	TypeScript
23	hupe1980/go-llmcache 🧠 Cache implementation for storing and retrieving results of language model...	20	Experimental	7	Go
24	Janghyun1230/FastKVzip Accurate and fast KV cache compression with a gating mechanism	18	Experimental	13	Python
25	DreamSoul-AI/OBCache OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference	15	Experimental	—	Python
26	wenzyxx00/LMCache Provide fast, memory-efficient caching for language models to improve...	14	Experimental	—	—
27	wln20/CSKV [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for...	14	Experimental	16	Python
28	YUECHE77/SustainableKV This is the official implementations for SustainableKV	13	Experimental	—	Python

Comparisons in this category

Awesome-KV-Cache-Compression and Awesome-LLM-KV-Cache (47 vs 39) Awesome-LLM-KV-Cache and Awesome-KV-Cache-Management (39 vs 33)