Zefan-Cai/R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

46
/ 100
Emerging

When working with large language models to solve complex reasoning tasks, like advanced math problems, the model can generate very long thought processes that consume a lot of memory. R-KV helps by intelligently compressing this 'thought process' (KV cache) on the fly, discarding repetitive or less important information. This allows the model to achieve the same or even better accuracy while using significantly less memory and running much faster. It's designed for anyone deploying or managing large language models for reasoning-heavy applications.

1,183 stars.

Use this if you are running large language models for complex, multi-step reasoning tasks and are facing challenges with high memory usage or slow inference speeds due to long generated outputs.

Not ideal if your models generate short responses or if your primary bottleneck is prompt processing rather than output generation during reasoning.

AI deployment large language models model inference reasoning tasks resource optimization
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 23 / 25

How are scores calculated?

Stars

1,183

Forks

190

Language

Python

License

Last pushed

Oct 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/Zefan-Cai/R-KV"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.