ovg-project/kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

/ 100

Verified

This project helps operations engineers and infrastructure managers more efficiently share expensive GPU resources among multiple Large Language Models (LLMs) or AI systems. It takes various LLM requests and dynamically allocates GPU memory, leading to better utilization and reduced costs. The key output is optimized GPU performance for serving or training diverse LLM workloads.

804 stars. Actively maintained with 25 commits in the last 30 days. Available on PyPI.

Use this if you are running multiple LLMs or complex AI systems on shared GPUs and need to improve resource utilization and reduce operational costs by dynamically managing their memory.

Not ideal if you are running a single LLM on a dedicated GPU and do not require flexible memory sharing or dynamic workload management.

LLM deployment GPU resource management AI infrastructure Cloud cost optimization Model serving

Maintenance 20 / 25

Adoption 10 / 25

Maturity 24 / 25

Community 20 / 25

How are scores calculated?

Stars

804

Forks

Language

Python

License

Apache-2.0

Related tools

InftyAI/Manta

💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to...

Explore MLOps Tools

All categories Trending MLOps directory Insights