neelsomani/kv-marketplace
Cross-GPU KV Cache Marketplace
This project helps large language model (LLM) serving platforms improve efficiency by sharing common parts of user prompts across different requests and GPUs. When multiple users send prompts that start with the same phrases (like a system instruction or a common question), the system can reuse the 'memory' from processing that shared prefix instead of recomputing it every time. This means you put in multiple text prompts, and you get out faster, more efficient text generation, making it ideal for organizations running LLM inference at scale.
Use this if you are running a large language model serving environment and want to reduce redundant computations and improve throughput by reusing common prompt prefixes across multiple GPUs on the same machine.
Not ideal if your LLM workload consists primarily of unique, short prompts with no overlapping prefixes, or if you need to share KV caches across different machines.
Stars
22
Forks
3
Language
Python
License
MIT
Category
Last pushed
Nov 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/neelsomani/kv-marketplace"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.