vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

87
/ 100
Verified

This project helps machine learning engineers and developers efficiently deploy and serve large language models (LLMs) in production environments. You provide your chosen LLM and receive a high-throughput, memory-optimized inference service ready for use. It's designed for ML engineers, MLOps specialists, and developers who need to integrate LLM capabilities into applications without sacrificing speed or cost efficiency.

73,007 stars. Used by 46 other packages. Actively maintained with 912 commits in the last 30 days. Available on PyPI.

Use this if you need to run large language models at scale, serve many user requests simultaneously, and minimize the computational resources required.

Not ideal if you are a casual user looking for a pre-built chatbot or a simple API wrapper without needing to manage the underlying model serving infrastructure.

LLM deployment model serving AI infrastructure MLOps API development
Maintenance 22 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 25 / 25

How are scores calculated?

Stars

73,007

Forks

14,312

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

912

Dependencies

68

Reverse dependents

46

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vllm-project/vllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.