kubeai-project/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

/ 100

Established

KubeAI helps machine learning operations teams deploy and manage AI models like large language models, embedding models, and speech-to-text systems on Kubernetes. It takes your trained ML models and makes them available for applications to use, handling tasks like intelligent scaling, model caching, and efficient request routing. This is for MLOps engineers and platform teams who need to reliably serve AI inference at scale.

1,161 stars. Actively maintained with 4 commits in the last 30 days.

Use this if you need to deploy and manage a variety of machine learning models (especially large language models or embedding models) in a Kubernetes environment and want to optimize their performance and scalability without complex dependencies.

Not ideal if you are looking for a simple tool for local model experimentation or if your inference workloads are very small-scale and don't require Kubernetes deployment.

MLOps AI Inference Kubernetes Deployment Large Language Models Speech Processing

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

1,161

Forks

125

Language

License

Apache-2.0

Compare

kubeai and LLMKube

Related tools

kubeflow/katib

Automated Machine Learning on Kubernetes

sgl-project/rbg

A workload for deploying LLM inference services on Kubernetes

beam-cloud/beta9

Ultrafast serverless GPU inference, sandboxes, and background jobs

ptimizeroracle/ondine

The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...

scitix/arks

Arks is a cloud-native inference framework running on Kubernetes

Explore MLOps Tools

All categories Trending MLOps directory Insights