jjiantong/Awesome-KV-Cache-Optimization

[Survey] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization

/ 100

Emerging

This is a curated collection of research papers and resources focused on making large language models (LLMs) run more efficiently without changing their core design or needing retraining. It categorizes and explains various techniques for optimizing how LLMs store and retrieve information during use, aiming to improve speed and reduce resource consumption. AI/ML system engineers, researchers, and infrastructure managers who deploy and maintain LLMs would find this useful.

310 stars.

Use this if you are responsible for optimizing the performance of deployed large language models and need to understand the latest techniques for improving their serving efficiency.

Not ideal if you are looking for an LLM development library or a guide on fine-tuning models.

LLM deployment AI infrastructure Model serving optimization System engineering Machine learning operations

No License No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 5 / 25

Community 9 / 25

How are scores calculated?

Stars

310

Forks

Language

Python

License

—

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Explore LLM Tools

All categories Trending LLM Tool directory Insights