Kv Cache Optimization Transformer Models

There are 9 kv cache optimization models tracked. 1 score above 70 (verified tier). The highest-rated is LMCache/LMCache at 79/100 with 7,664 stars. 1 of the top 10 are actively maintained.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=kv-cache-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	LMCache/LMCache Supercharge Your LLM with the Fastest KV Cache Layer	79	Verified	7,664	Python
2	Zefan-Cai/KVCache-Factory Unified KV Cache Compression Methods for Auto-Regressive Models	47	Emerging	1,309	Python
3	dataflowr/llm_efficiency KV Cache & LoRA for minGPT	41	Emerging	59	Python
4	OnlyTerp/kvtc First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache...	38	Emerging	5	Python
5	itsnamgyu/block-transformer Block Transformer: Global-to-Local Language Modeling for Fast Inference...	38	Emerging	163	Python
6	OnlyTerp/turboquant First open-source implementation of Google TurboQuant (ICLR 2026) --...	35	Emerging	36	Python
7	codepawl/turboquant-torch Unofficial PyTorch implementation of TurboQuant (Google Research, ICLR...	27	Experimental	9	Python
8	DRSY/EasyKV Easy control for Key-Value Constrained Generative LLM...	25	Experimental	62	Python
9	DingWeiPeng/Transformer-decoder-only-with-KV-Cache Transformer Key Value Store/Transformer Decoder Only with KV Cache	10	Experimental	2	Python

Comparisons in this category

LMCache and llm_efficiency (79 vs 41) turboquant and turboquant-torch (35 vs 27)