openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
This tool helps software developers efficiently deploy and manage machine learning models, including large language models and generative AI models, in production environments. It takes trained models from various frameworks (like TensorFlow, ONNX) and makes them available via standard network protocols (REST or gRPC). The end-user is a software architect or MLOps engineer responsible for integrating AI models into applications.
836 stars. Actively maintained with 38 commits in the last 30 days.
Use this if you need a scalable and flexible way to serve machine learning models from various frameworks to client applications, especially in cloud or microservices-based architectures.
Not ideal if you are looking for a tool to train machine learning models or if your deployment needs are very simple and do not require high performance or remote inference capabilities.
Stars
836
Forks
241
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
38
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/openvinotoolkit/model_server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
madroidmaq/mlx-omni-server
MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically...
NVIDIA-NeMo/Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based...
generative-computing/mellea
Mellea is a library for writing generative programs.
rhesis-ai/rhesis
Open-source platform & SDK for testing LLM and agentic apps. Define expected behavior, generate...
taco-group/OpenEMMA
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.