novitalabs/pegaflow
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.
When running large language models (LLMs) for inference, PegaFlow helps manage the KV cache — a key component that stores information about previous tokens. It takes the KV cache, which usually sits on expensive GPU memory, and intelligently moves it to more affordable host memory or SSDs, even across multiple machines. This tool is designed for MLOps engineers or platform teams responsible for deploying and managing LLMs efficiently in production.
Use this if you are running LLMs and want to reduce GPU memory usage, improve performance by sharing KV cache across requests or instances, or deploy models more cost-effectively on a cluster.
Not ideal if you are experimenting with LLMs on a single GPU locally and do not face significant memory or scaling challenges.
Stars
27
Forks
6
Language
Rust
License
Apache-2.0
Category
Last pushed
Mar 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/novitalabs/pegaflow"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.