Chen-zexi/vllm-cli

A command-line interface tool for serving LLM using vLLM.

57
/ 100
Established

This tool helps developers and ML engineers efficiently serve large language models (LLMs) on their own hardware. It takes your local LLM files (like those from HuggingFace or Ollama) and turns them into a high-performance, accessible service, either interactively or via command-line. ML practitioners who need to deploy and manage LLMs for applications or testing will find this useful.

482 stars. Available on PyPI.

Use this if you need to serve one or more large language models from your own GPU-equipped machine with optimized performance and easy management.

Not ideal if you primarily use cloud-based LLM APIs or don't have access to CUDA-compatible GPU hardware.

LLM deployment model serving ML infrastructure GPU optimization local AI
Maintenance 10 / 25
Adoption 10 / 25
Maturity 24 / 25
Community 13 / 25

How are scores calculated?

Stars

482

Forks

27

Language

Python

License

MIT

Last pushed

Jan 25, 2026

Commits (30d)

0

Dependencies

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Chen-zexi/vllm-cli"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.