wln20/CSKV
[NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
This project helps machine learning engineers and researchers optimize Large Language Models (LLMs) for handling very long text inputs. It takes an existing LLM and compresses its internal memory (KV cache) without major retraining. The result is an LLM that can process much longer contexts with significantly less memory overhead, making it more efficient for demanding applications.
No commits in the last 6 months.
Use this if you are an ML engineer or researcher facing memory constraints when deploying or experimenting with LLMs on long-context tasks, and you want to reduce memory usage with minimal retraining effort.
Not ideal if you are looking for a completely training-free solution or if you need to optimize an LLM for tasks that do not involve long-context processing.
Stars
16
Forks
—
Language
Python
License
—
Category
Last pushed
Oct 18, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wln20/CSKV"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.