sgl-project/ome
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
The Open Model Engine (OME) helps machine learning infrastructure teams efficiently deploy and manage Large Language Models (LLMs) within their Kubernetes environments. It takes various LLM files and configurations as input, then automatically sets up optimized serving runtimes and manages GPU resources, outputting readily available inference endpoints. Platform engineers, MLOps engineers, and infrastructure architects are the primary users who benefit from this solution.
393 stars.
Use this if you need to standardize, automate, and optimize the deployment and serving of multiple LLMs on Kubernetes, ensuring efficient GPU utilization and high availability.
Not ideal if you are a data scientist primarily focused on developing and experimenting with models on a local machine rather than managing large-scale, production-grade deployments.
Stars
393
Forks
64
Language
Go
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sgl-project/ome"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...
zhudotexe/kani
kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.