kubeflow/trainer

Distributed AI Model Training and LLM Fine-Tuning on Kubernetes

71
/ 100
Verified

This platform helps AI practitioners efficiently train and fine-tune large AI models, including Large Language Models (LLMs), on powerful computing clusters. You provide your AI model architecture and training data, and the platform manages the complex process of distributing the workload across multiple GPUs and machines to produce a highly optimized, trained model. It's designed for AI engineers, data scientists, and ML researchers working with large-scale AI projects.

2,050 stars. Actively maintained with 29 commits in the last 30 days.

Use this if you need to train or fine-tune very large AI models, especially LLMs, and require a scalable, distributed system to manage your multi-GPU and multi-node computing resources efficiently.

Not ideal if you are working with smaller AI models that can be trained on a single machine, or if you prefer not to manage Kubernetes-based infrastructure for your AI workloads.

AI model training LLM fine-tuning distributed machine learning high-performance computing machine learning operations
No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

2,050

Forks

925

Language

Go

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

29

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mlops/kubeflow/trainer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.