Zefan-Cai/R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

/ 100

Emerging

When working with large language models to solve complex reasoning tasks, like advanced math problems, the model can generate very long thought processes that consume a lot of memory. R-KV helps by intelligently compressing this 'thought process' (KV cache) on the fly, discarding repetitive or less important information. This allows the model to achieve the same or even better accuracy while using significantly less memory and running much faster. It's designed for anyone deploying or managing large language models for reasoning-heavy applications.

1,183 stars.

Use this if you are running large language models for complex, multi-step reasoning tasks and are facing challenges with high memory usage or slow inference speeds due to long generated outputs.

Not ideal if your models generate short responses or if your primary bottleneck is prompt processing rather than output generation during reasoning.

AI deployment large language models model inference reasoning tasks resource optimization

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 7 / 25

Community 23 / 25

How are scores calculated?

Stars

1,183

Forks

190

Language

Python

License

—

Related tools

snu-mllab/KVzip

[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in...

codefuse-ai/ModelCache

A LLM semantic caching system aiming to enhance user experience by reducing response time via...

philtimmes/KeSSie

KeSSie HUGE Context Semantic recall for Large Language Models

Explore Embedding Tools

All categories Trending Embeddings directory Insights