runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

61
/ 100
Established

This project helps developers deploy and manage large language models (LLMs) as highly performant, serverless API endpoints. It takes a chosen LLM (like Llama-3.1-8B-Instruct or OpenChat-3.5) and serves it through an API that's compatible with OpenAI's format. The primary users are developers who need to integrate custom LLM capabilities into their applications with speed and efficiency.

406 stars.

Use this if you are a developer looking to deploy your own large language models efficiently and scale them as serverless, OpenAI-compatible API endpoints.

Not ideal if you are an end-user without programming experience, as this tool requires familiarity with Docker, API configuration, and development workflows.

AI-application-development MLOps API-development backend-development large-language-model-deployment
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

406

Forks

290

Language

Python

License

MIT

Last pushed

Mar 10, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/runpod-workers/worker-vllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.