KV Cache Optimization LLM Tools
Systems and frameworks for managing, compressing, and optimizing KV cache memory usage in LLM inference. Includes cache storage engines, virtual memory approaches, and persistence layers. Does NOT include general LLM caching proxies, semantic caching, or request/response deduplication tools.
There are 28 kv cache optimization tools tracked. 1 score above 50 (established tier). The highest-rated is ModelEngine-Group/unified-cache-management at 58/100 with 261 stars.
Get all 28 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=kv-cache-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM. |
|
Established |
| 2 |
reloadware/reloadium
Hot Reloading and Profiling for Python |
|
Emerging |
| 3 |
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with... |
|
Emerging |
| 4 |
October2001/Awesome-KV-Cache-Compression
๐ฐ Must-read papers on KV Cache Compression (constantly updating ๐ค). |
|
Emerging |
| 5 |
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of ๐Awesome LLM KV Cache Papers with Codes. |
|
Emerging |
| 6 |
xcena-dev/maru
High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference |
|
Emerging |
| 7 |
sensoris/semcache
Semantic caching layer for your LLM applications. Reuse responses and reduce... |
|
Emerging |
| 8 |
dipampaul17/KVSplit
Run larger LLMs with longer contexts on Apple Silicon by using... |
|
Emerging |
| 9 |
samfurr/foveated_kv
Importance-adaptive mixed-precision KV cache compression for LLM inference... |
|
Emerging |
| 10 |
jjiantong/Awesome-KV-Cache-Optimization
[Survey] Towards Efficient Large Language Model Serving: A Survey on... |
|
Emerging |
| 11 |
TreeAI-Lab/Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development,... |
|
Emerging |
| 12 |
TheToughCrane/nano-kvllm
This project aims to provide a high effective KV cache manage framework for... |
|
Emerging |
| 13 |
Naveenub/quantum-pulse
Extreme-density data vault for LLM training sets. MsgPack + Zstd-L22 +... |
|
Experimental |
| 14 |
helgklaizar/turboquant_mlx
Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon... |
|
Experimental |
| 15 |
jandhyala-dev/modelai-llama.cpp
Production fork of llama.cpp adding KV cache compaction via Attention Matching |
|
Experimental |
| 16 |
MSNP1381/cache-cool
๐ Cache-cool: A fast, flexible LLM caching proxy that reduces latency and... |
|
Experimental |
| 17 |
RemizovDenis/turboquant
TurboQuant: KV-cache compression for faster and cheaper LLM inference. |
|
Experimental |
| 18 |
Siddhant-K-code/tokenvm
TokenVM is a high-performance runtime that treats LLM KV cache and... |
|
Experimental |
| 19 |
Jamalianpour/semantic-llm-cache
Semantic caching for LLM API responses in Spring Boot applications |
|
Experimental |
| 20 |
raymond-UI/llm-cache
LLM request/response caching with tiered TTL, time travel, and request... |
|
Experimental |
| 21 |
sentinelXVI/KeSSie
Enable efficient LLM inference by managing large token histories with a... |
|
Experimental |
| 22 |
rizwan199811/neurocache
Reduce LLM API costs and speed up responses by caching completions with... |
|
Experimental |
| 23 |
hupe1980/go-llmcache
๐ง Cache implementation for storing and retrieving results of language model... |
|
Experimental |
| 24 |
Janghyun1230/FastKVzip
Accurate and fast KV cache compression with a gating mechanism |
|
Experimental |
| 25 |
DreamSoul-AI/OBCache
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference |
|
Experimental |
| 26 |
wenzyxx00/LMCache
Provide fast, memory-efficient caching for language models to improve... |
|
Experimental |
| 27 |
wln20/CSKV
[NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for... |
|
Experimental |
| 28 |
YUECHE77/SustainableKV
This is the official implementations for SustainableKV |
|
Experimental |