vllm and gpustack

vLLM is a core inference engine that GPUStack wraps and orchestrates, making them complements—GPUStack provides multi-engine selection and performance tuning on top of vLLM's serving capabilities rather than replacing it.

vllm
87
Verified
gpustack
68
Established
Maintenance 22/25
Adoption 15/25
Maturity 25/25
Community 25/25
Maintenance 22/25
Adoption 10/25
Maturity 16/25
Community 20/25
Stars: 73,007
Forks: 14,312
Downloads:
Commits (30d): 912
Language: Python
License: Apache-2.0
Stars: 4,630
Forks: 472
Downloads:
Commits (30d): 71
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

This project helps machine learning engineers and developers efficiently deploy and serve large language models (LLMs) in production environments. You provide your chosen LLM and receive a high-throughput, memory-optimized inference service ready for use. It's designed for ML engineers, MLOps specialists, and developers who need to integrate LLM capabilities into applications without sacrificing speed or cost efficiency.

LLM deployment model serving AI infrastructure MLOps API development

About gpustack

gpustack/gpustack

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

GPUStack helps organizations efficiently deploy and manage AI models for inference across various GPU setups, from on-premises servers to cloud environments. It takes your AI models and outputs optimized, high-performance services ready for use. This tool is designed for IT organizations, MLOps teams, and service providers who need to deliver AI models as a service at scale.

AI-model-deployment GPU-cluster-management AI-inference-optimization MLOps Model-as-a-Service

Scores updated daily from GitHub, PyPI, and npm data. How scores work