sql-machine-learning/elasticdl
Kubernetes-native Deep Learning Framework
This framework helps machine learning engineers efficiently train deep learning models using existing TensorFlow or PyTorch code. It takes your model definition and training data, then leverages a Kubernetes cluster to distribute the training process. The output is a trained deep learning model, achieved with better resource utilization and without interruption from system failures.
746 stars. No commits in the last 6 months.
Use this if you are a machine learning engineer running deep learning training on a Kubernetes cluster and need better fault tolerance and elastic resource scheduling.
Not ideal if you are not using Kubernetes for your deep learning infrastructure or prefer TensorFlow's/PyTorch's native distributed computing features without external orchestration.
Stars
746
Forks
116
Language
Python
License
MIT
Category
Last pushed
Jan 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/sql-machine-learning/elasticdl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
muna-ai/muna-py
Run AI models anywhere. https://muna.ai/explore
clearml/clearml-pycharm-plugin
ClearML PyCharm Plugin
microsoft/AKSDeploymentTutorial
Tutorial on how to deploy Deep Learning models on GPU enabled Kubernetes cluster
Langhalsdino/Kubernetes-GPU-Guide
This guide should help fellow researchers and hobbyists to easily automate and accelerate there...
tamohannes/urartu
Build ML pipelines with smart caching and remote execution. Develop locally, deploy to HPC...