runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
This project helps developers deploy and manage large language models (LLMs) as highly performant, serverless API endpoints. It takes a chosen LLM (like Llama-3.1-8B-Instruct or OpenChat-3.5) and serves it through an API that's compatible with OpenAI's format. The primary users are developers who need to integrate custom LLM capabilities into their applications with speed and efficiency.
406 stars.
Use this if you are a developer looking to deploy your own large language models efficiently and scale them as serverless, OpenAI-compatible API endpoints.
Not ideal if you are an end-user without programming experience, as this tool requires familiarity with Docker, API configuration, and development workflows.
Stars
406
Forks
290
Language
Python
License
MIT
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/runpod-workers/worker-vllm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from...
av/harbor
One command brings a complete pre-wired LLM stack with hundreds of services to explore.
RunanywhereAI/runanywhere-sdks
Production ready toolkit to run AI locally
foldl/chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
FarisZahrani/llama-cpp-py-sync
Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).