kubeflow/trainer
Distributed AI Model Training and LLM Fine-Tuning on Kubernetes
This platform helps AI practitioners efficiently train and fine-tune large AI models, including Large Language Models (LLMs), on powerful computing clusters. You provide your AI model architecture and training data, and the platform manages the complex process of distributing the workload across multiple GPUs and machines to produce a highly optimized, trained model. It's designed for AI engineers, data scientists, and ML researchers working with large-scale AI projects.
2,050 stars. Actively maintained with 29 commits in the last 30 days.
Use this if you need to train or fine-tune very large AI models, especially LLMs, and require a scalable, distributed system to manage your multi-GPU and multi-node computing resources efficiently.
Not ideal if you are working with smaller AI models that can be trained on a single machine, or if you prefer not to manage Kubernetes-based infrastructure for your AI workloads.
Stars
2,050
Forks
925
Language
Go
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
29
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/kubeflow/trainer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
nndeploy/nndeploy
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps,...
cncf/llm-in-action
🤖 Discover how to apply your LLM app skills on Kubernetes!
llmcloud24/de.KCD-Summer-School-2024
Learn how to deploy your own LLM in the de.NBI cloud via a step-by-step guided journey...
ray-project/llms-in-prod-workshop-2023
Deploy and Scale LLM-based applications