vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
This project helps machine learning engineers and developers efficiently deploy and serve large language models (LLMs) in production environments. You provide your chosen LLM and receive a high-throughput, memory-optimized inference service ready for use. It's designed for ML engineers, MLOps specialists, and developers who need to integrate LLM capabilities into applications without sacrificing speed or cost efficiency.
73,007 stars. Used by 46 other packages. Actively maintained with 912 commits in the last 30 days. Available on PyPI.
Use this if you need to run large language models at scale, serve many user requests simultaneously, and minimize the computational resources required.
Not ideal if you are a casual user looking for a pre-built chatbot or a simple API wrapper without needing to manage the underlying model serving infrastructure.
Stars
73,007
Forks
14,312
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
912
Dependencies
68
Reverse dependents
46
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vllm-project/vllm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Community Discussion
Recent Releases
Compare
Related models
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...
tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.