October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

/ 100

Emerging

This resource provides a curated collection of research papers and projects focused on optimizing the memory usage of Large Language Models (LLMs). It gathers various techniques to make LLMs run more efficiently, specifically by managing their 'KV Cache' – a memory component crucial for generating responses. This helps AI researchers and practitioners identify and implement methods to reduce the computational demands and costs associated with deploying and operating LLMs.

668 stars.

Use this if you are a researcher, engineer, or practitioner working with Large Language Models and want to understand or implement methods to reduce their memory footprint and improve inference efficiency.

Not ideal if you are looking for a plug-and-play software solution or a general introduction to LLMs without a technical background in their architecture and optimization.

Large Language Models LLM Optimization AI Inference Natural Language Processing Deep Learning Efficiency

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

668

Forks

Language

—

License

MIT

Compare

Awesome-KV-Cache-Compression and Awesome-LLM-KV-Cache

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

xcena-dev/maru

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

Explore LLM Tools

All categories Trending LLM Tool directory Insights