sgl-project/ome

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

56
/ 100
Established

The Open Model Engine (OME) helps machine learning infrastructure teams efficiently deploy and manage Large Language Models (LLMs) within their Kubernetes environments. It takes various LLM files and configurations as input, then automatically sets up optimized serving runtimes and manages GPU resources, outputting readily available inference endpoints. Platform engineers, MLOps engineers, and infrastructure architects are the primary users who benefit from this solution.

393 stars.

Use this if you need to standardize, automate, and optimize the deployment and serving of multiple LLMs on Kubernetes, ensuring efficient GPU utilization and high availability.

Not ideal if you are a data scientist primarily focused on developing and experimenting with models on a local machine rather than managing large-scale, production-grade deployments.

MLOps LLM Deployment Kubernetes Management GPU Orchestration AI Infrastructure
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 21 / 25

How are scores calculated?

Stars

393

Forks

64

Language

Go

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sgl-project/ome"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.