ovg-project/kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

74
/ 100
Verified

This project helps operations engineers and infrastructure managers more efficiently share expensive GPU resources among multiple Large Language Models (LLMs) or AI systems. It takes various LLM requests and dynamically allocates GPU memory, leading to better utilization and reduced costs. The key output is optimized GPU performance for serving or training diverse LLM workloads.

804 stars. Actively maintained with 25 commits in the last 30 days. Available on PyPI.

Use this if you are running multiple LLMs or complex AI systems on shared GPUs and need to improve resource utilization and reduce operational costs by dynamically managing their memory.

Not ideal if you are running a single LLM on a dedicated GPU and do not require flexible memory sharing or dynamic workload management.

LLM deployment GPU resource management AI infrastructure Cloud cost optimization Model serving
Maintenance 20 / 25
Adoption 10 / 25
Maturity 24 / 25
Community 20 / 25

How are scores calculated?

Stars

804

Forks

90

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

25

Dependencies

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mlops/ovg-project/kvcached"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.