novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

49
/ 100
Emerging

When running large language models (LLMs) for inference, PegaFlow helps manage the KV cache — a key component that stores information about previous tokens. It takes the KV cache, which usually sits on expensive GPU memory, and intelligently moves it to more affordable host memory or SSDs, even across multiple machines. This tool is designed for MLOps engineers or platform teams responsible for deploying and managing LLMs efficiently in production.

Use this if you are running LLMs and want to reduce GPU memory usage, improve performance by sharing KV cache across requests or instances, or deploy models more cost-effectively on a cluster.

Not ideal if you are experimenting with LLMs on a single GPU locally and do not face significant memory or scaling challenges.

LLM-inference MLOps GPU-optimization model-serving AI-infrastructure
No Package No Dependents
Maintenance 13 / 25
Adoption 7 / 25
Maturity 13 / 25
Community 16 / 25

How are scores calculated?

Stars

27

Forks

6

Language

Rust

License

Apache-2.0

Last pushed

Mar 20, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/novitalabs/pegaflow"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.