Chen-zexi/vllm-cli
A command-line interface tool for serving LLM using vLLM.
This tool helps developers and ML engineers efficiently serve large language models (LLMs) on their own hardware. It takes your local LLM files (like those from HuggingFace or Ollama) and turns them into a high-performance, accessible service, either interactively or via command-line. ML practitioners who need to deploy and manage LLMs for applications or testing will find this useful.
482 stars. Available on PyPI.
Use this if you need to serve one or more large language models from your own GPU-equipped machine with optimized performance and easy management.
Not ideal if you primarily use cloud-based LLM APIs or don't have access to CUDA-compatible GPU hardware.
Stars
482
Forks
27
Language
Python
License
MIT
Category
Last pushed
Jan 25, 2026
Commits (30d)
0
Dependencies
10
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Chen-zexi/vllm-cli"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
AlexsJones/llmfit
Hundreds of models & providers. One command to find what runs on your hardware.
victordibia/llmx
An API for Chat Fine-Tuned Large Language Models (llm)
InftyAI/llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
livehl/aimirror
🚀 200倍速!AI时代的下载神器 | Docker/PyPI/HuggingFace/CRAN 全加速 | 并行分片+智能缓存,让下载飞起来
TakatoHonda/sui-lang
粋 (Sui) - A programming language optimized for LLM code generation