novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

/ 100

Emerging

When running large language models (LLMs) for inference, PegaFlow helps manage the KV cache — a key component that stores information about previous tokens. It takes the KV cache, which usually sits on expensive GPU memory, and intelligently moves it to more affordable host memory or SSDs, even across multiple machines. This tool is designed for MLOps engineers or platform teams responsible for deploying and managing LLMs efficiently in production.

Use this if you are running LLMs and want to reduce GPU memory usage, improve performance by sharing KV cache across requests or instances, or deploy models more cost-effectively on a cluster.

Not ideal if you are experimenting with LLMs on a single GPU locally and do not face significant memory or scaling challenges.

LLM-inference MLOps GPU-optimization model-serving AI-infrastructure

No Package No Dependents

Maintenance 13 / 25

Adoption 7 / 25

Maturity 13 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

Rust

License

Apache-2.0

Related tools

Resk-Security/resk-caching

Resk-Caching is a Bun-based backend library and server designed for secure caching, embeddings...

kushagrasri1412/PYROCACHE

AI-Augmented In-Memory Cache Engine — Redis-compatible server built from scratch in Python with...

Explore Vector Databases

All categories Trending Vector Database directory Insights