kubeai-project/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
KubeAI helps machine learning operations teams deploy and manage AI models like large language models, embedding models, and speech-to-text systems on Kubernetes. It takes your trained ML models and makes them available for applications to use, handling tasks like intelligent scaling, model caching, and efficient request routing. This is for MLOps engineers and platform teams who need to reliably serve AI inference at scale.
1,161 stars. Actively maintained with 4 commits in the last 30 days.
Use this if you need to deploy and manage a variety of machine learning models (especially large language models or embedding models) in a Kubernetes environment and want to optimize their performance and scalability without complex dependencies.
Not ideal if you are looking for a simple tool for local model experimentation or if your inference workloads are very small-scale and don't require Kubernetes deployment.
Stars
1,161
Forks
125
Language
Go
License
Apache-2.0
Category
Last pushed
Feb 23, 2026
Commits (30d)
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/kubeai-project/kubeai"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
kubeflow/katib
Automated Machine Learning on Kubernetes
sgl-project/rbg
A workload for deploying LLM inference services on Kubernetes
beam-cloud/beta9
Ultrafast serverless GPU inference, sandboxes, and background jobs
ptimizeroracle/ondine
The LLM Dataset Engine — batch process millions of rows with 100+ providers. Multi-row batching...
scitix/arks
Arks is a cloud-native inference framework running on Kubernetes