neelsomani/kv-marketplace

Cross-GPU KV Cache Marketplace

36
/ 100
Emerging

This project helps large language model (LLM) serving platforms improve efficiency by sharing common parts of user prompts across different requests and GPUs. When multiple users send prompts that start with the same phrases (like a system instruction or a common question), the system can reuse the 'memory' from processing that shared prefix instead of recomputing it every time. This means you put in multiple text prompts, and you get out faster, more efficient text generation, making it ideal for organizations running LLM inference at scale.

Use this if you are running a large language model serving environment and want to reduce redundant computations and improve throughput by reusing common prompt prefixes across multiple GPUs on the same machine.

Not ideal if your LLM workload consists primarily of unique, short prompts with no overlapping prefixes, or if you need to share KV caches across different machines.

LLM-serving AI-inference GPU-utilization chatbot-optimization text-generation-efficiency
No Package No Dependents
Maintenance 6 / 25
Adoption 6 / 25
Maturity 13 / 25
Community 11 / 25

How are scores calculated?

Stars

22

Forks

3

Language

Python

License

MIT

Last pushed

Nov 12, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/neelsomani/kv-marketplace"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.