jjiantong/Awesome-KV-Cache-Optimization
[Survey] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization
This is a curated collection of research papers and resources focused on making large language models (LLMs) run more efficiently without changing their core design or needing retraining. It categorizes and explains various techniques for optimizing how LLMs store and retrieve information during use, aiming to improve speed and reduce resource consumption. AI/ML system engineers, researchers, and infrastructure managers who deploy and maintain LLMs would find this useful.
310 stars.
Use this if you are responsible for optimizing the performance of deployed large language models and need to understand the latest techniques for improving their serving efficiency.
Not ideal if you are looking for an LLM development library or a guide on fine-tuning models.
Stars
310
Forks
10
Language
Python
License
—
Category
Last pushed
Jan 18, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jjiantong/Awesome-KV-Cache-Optimization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.