sgl-project/ome

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

/ 100

Established

The Open Model Engine (OME) helps machine learning infrastructure teams efficiently deploy and manage Large Language Models (LLMs) within their Kubernetes environments. It takes various LLM files and configurations as input, then automatically sets up optimized serving runtimes and manages GPU resources, outputting readily available inference endpoints. Platform engineers, MLOps engineers, and infrastructure architects are the primary users who benefit from this solution.

393 stars.

Use this if you need to standardize, automate, and optimize the deployment and serving of multiple LLMs on Kubernetes, ensuring efficient GPU utilization and high availability.

Not ideal if you are a data scientist primarily focused on developing and experimenting with models on a local machine rather than managing large-scale, production-grade deployments.

MLOps LLM Deployment Kubernetes Management GPU Orchestration AI Infrastructure

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 21 / 25

How are scores calculated?

Stars

393

Forks

Language

License

Apache-2.0

Related models

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...

zhudotexe/kani

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Explore Transformer Models

All categories Trending Transformer directory Insights