runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

/ 100

Established

This project helps developers deploy and manage large language models (LLMs) as highly performant, serverless API endpoints. It takes a chosen LLM (like Llama-3.1-8B-Instruct or OpenChat-3.5) and serves it through an API that's compatible with OpenAI's format. The primary users are developers who need to integrate custom LLM capabilities into their applications with speed and efficiency.

406 stars.

Use this if you are a developer looking to deploy your own large language models efficiently and scale them as serverless, OpenAI-compatible API endpoints.

Not ideal if you are an end-user without programming experience, as this tool requires familiarity with Docker, API configuration, and development workflows.

AI-application-development MLOps API-development backend-development large-language-model-deployment

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

406

Forks

290

Language

Python

License

MIT

Compare

worker-vllm and runpod-worker-oobabooga

Related tools

containers/ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from...

av/harbor

One command brings a complete pre-wired LLM stack with hundreds of services to explore.

RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

foldl/chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

FarisZahrani/llama-cpp-py-sync

Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).

Explore LLM Tools

All categories Trending LLM Tool directory Insights