ovg-project/kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
This project helps operations engineers and infrastructure managers more efficiently share expensive GPU resources among multiple Large Language Models (LLMs) or AI systems. It takes various LLM requests and dynamically allocates GPU memory, leading to better utilization and reduced costs. The key output is optimized GPU performance for serving or training diverse LLM workloads.
804 stars. Actively maintained with 25 commits in the last 30 days. Available on PyPI.
Use this if you are running multiple LLMs or complex AI systems on shared GPUs and need to improve resource utilization and reduce operational costs by dynamically managing their memory.
Not ideal if you are running a single LLM on a dedicated GPU and do not require flexible memory sharing or dynamic workload management.
Stars
804
Forks
90
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
25
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/ovg-project/kvcached"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.