tensorchord/openmodelz
Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)
This project helps data scientists and SREs quickly deploy large language models (LLMs) and other machine learning models for live use. It takes a trained model and automatically sets up all the necessary infrastructure, like monitoring, scaling, and public access, providing a ready-to-use public endpoint. It is for anyone who needs to take a machine learning model from development to a production environment without getting bogged down in complex infrastructure setup.
281 stars. No commits in the last 6 months.
Use this if you need to deploy machine learning models, especially large language models, to a production environment quickly and efficiently without manually configuring all the underlying infrastructure.
Not ideal if you need extremely fine-grained control over every aspect of your infrastructure setup or are working with very small, simple models that don't require autoscaling.
Stars
281
Forks
25
Language
Go
License
Apache-2.0
Category
Last pushed
Nov 03, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/tensorchord/openmodelz"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kubeflow/katib
Automated Machine Learning on Kubernetes
kubeai-project/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports...
sgl-project/rbg
A workload for deploying LLM inference services on Kubernetes
beam-cloud/beta9
Ultrafast serverless GPU inference, sandboxes, and background jobs
ptimizeroracle/ondine
The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...