jjiantong/Awesome-KV-Cache-Optimization

[Survey] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization

34
/ 100
Emerging

This is a curated collection of research papers and resources focused on making large language models (LLMs) run more efficiently without changing their core design or needing retraining. It categorizes and explains various techniques for optimizing how LLMs store and retrieve information during use, aiming to improve speed and reduce resource consumption. AI/ML system engineers, researchers, and infrastructure managers who deploy and maintain LLMs would find this useful.

310 stars.

Use this if you are responsible for optimizing the performance of deployed large language models and need to understand the latest techniques for improving their serving efficiency.

Not ideal if you are looking for an LLM development library or a guide on fine-tuning models.

LLM deployment AI infrastructure Model serving optimization System engineering Machine learning operations
No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 5 / 25
Community 9 / 25

How are scores calculated?

Stars

310

Forks

10

Language

Python

License

Last pushed

Jan 18, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jjiantong/Awesome-KV-Cache-Optimization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.