Distributed Training Frameworks
Frameworks and libraries for distributed training of machine learning models across multiple GPUs, nodes, or devices using data parallelism, model parallelism, or hybrid approaches. Does NOT include single-machine training optimization, inference frameworks, or educational tutorials on distributed concepts without working implementations.
There are 121 distributed training frameworks tracked. 2 score above 70 (verified tier). The highest-rated is deepspeedai/DeepSpeed at 81/100 with 41,801 stars. 2 of the top 10 are actively maintained.
Get all 121 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=distributed-training-frameworks&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed... |
|
Verified |
| 2 |
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI... |
|
Verified |
| 3 |
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible |
|
Established |
| 4 |
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. |
|
Established |
| 5 |
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs... |
|
Established |
| 6 |
learning-at-home/hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands... |
|
Established |
| 7 |
xorbitsai/xorbits
Scalable Python DS & ML, in an API compatible & lightning fast way. |
|
Established |
| 8 |
google/sedpack
Sedpack - Scalable and efficient data packing |
|
Established |
| 9 |
HazyResearch/fonduer
A knowledge base construction engine for richly formatted data |
|
Established |
| 10 |
btursunbayev/nvsonar
Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes |
|
Established |
| 11 |
cylondata/cylon
Cylon is a fast, scalable, distributed memory, parallel runtime with a... |
|
Established |
| 12 |
kakaobrain/torchgpipe
A GPipe implementation in PyTorch |
|
Established |
| 13 |
spotify/pythonflow
:snake: Dataflow programming for python. |
|
Established |
| 14 |
fastai/fastgpu
A queue service for quickly developing scripts that use all your GPUs efficiently |
|
Established |
| 15 |
BaguaSys/bagua
Bagua Speeds up PyTorch |
|
Established |
| 16 |
cerndb/dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras... |
|
Established |
| 17 |
TGSAI/mdio-python
Cloud native, scalable storage engine for various types of energy data. |
|
Established |
| 18 |
IBM/FfDL
Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning... |
|
Established |
| 19 |
maxpumperla/elephas
Distributed Deep learning with Keras & Spark |
|
Established |
| 20 |
Mitchell-Mirano/sorix
Sorix, high performance, easy to learn, fast to code, from prototype to production |
|
Established |
| 21 |
NimbleBoxAI/nbox
The official python package for NimbleBox. Exposes all APIs as CLIs and... |
|
Established |
| 22 |
h2oai/h2o4gpu
H2Oai GPU Edition |
|
Established |
| 23 |
PanJinquan/Pytorch-Base-Trainer
Pytorch分布式训练框架 |
|
Emerging |
| 24 |
sehoffmann/dmlcloud
Painless distributed training with torch |
|
Emerging |
| 25 |
saforem2/ezpz
Train across all your devices, ezpz 🍋 |
|
Emerging |
| 26 |
aksnzhy/xlearn
High performance, easy-to-use, and scalable machine learning (ML) package,... |
|
Emerging |
| 27 |
PaddlePaddle/PaddleCloud
PaddlePaddle Docker images and K8s operators for PaddleOCR/Detection... |
|
Emerging |
| 28 |
Hsword/Hetu
A high-performance distributed deep learning system targeting large-scale... |
|
Emerging |
| 29 |
bytedance/byteps
A high performance and generic framework for distributed DNN training |
|
Emerging |
| 30 |
mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation... |
|
Emerging |
| 31 |
alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning... |
|
Emerging |
| 32 |
determined-ai/determined
Determined is an open-source machine learning platform that simplifies... |
|
Emerging |
| 33 |
nf-core/deepmodeloptim
Stochastic Testing and Input Manipulation for Unbiased Learning Systems |
|
Emerging |
| 34 |
lynxkite/lynxkite
The complete graph data science platform |
|
Emerging |
| 35 |
Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training |
|
Emerging |
| 36 |
firmai/pandapy
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster... |
|
Emerging |
| 37 |
array2d/deepx
Large-scale Auto-Distributed Training/Inference Unified Framework |... |
|
Emerging |
| 38 |
uber/fiber
Distributed Computing for AI Made Simple |
|
Emerging |
| 39 |
williamFalcon/test-tube
Python library to easily log experiments and parallelize hyperparameter... |
|
Emerging |
| 40 |
unslothai/hyperlearn
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old. |
|
Emerging |
| 41 |
BBEK-Anand/PyTorchLabFlow
To manage PyTorch experiments with ease, analyse all components of training pipeline. |
|
Emerging |
| 42 |
IntelPython/sdc
Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler |
|
Emerging |
| 43 |
allenai/tango
Organize your experiments into discrete steps that can be cached and reused... |
|
Emerging |
| 44 |
flow2ml/Flow2ML
An Open Source Library to make Machine Learning process much Simpler |
|
Emerging |
| 45 |
Asthestarsfalll/ExCore
A Modern Configuration/Registry System designed for deeplearning, with some utils. |
|
Emerging |
| 46 |
geoffxy/habitat
🔮 Execution time predictions for deep neural network training iterations... |
|
Emerging |
| 47 |
lucasbrianpiveta/Hetu-DiT
🚀 Optimize your Diffusion Transformers with Hetu-DiT, a dynamic parallel... |
|
Emerging |
| 48 |
ravenprotocol/ravnest
Decentralized Asynchronous Training on Heterogeneous Devices |
|
Emerging |
| 49 |
rkhan055/SHADE
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training |
|
Emerging |
| 50 |
deepfinch/XLearning-GPU
qihoo360 xlearning with GPU support; AI on Hadoop |
|
Emerging |
| 51 |
openclimatefix/ocf_datapipes
OCF's DataPipe based dataloader for training and inference |
|
Emerging |
| 52 |
rentainhe/pytorch-distributed-training
Simple tutorials on Pytorch DDP training |
|
Emerging |
| 53 |
hora-search/horapy
🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library |
|
Emerging |
| 54 |
hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model |
|
Emerging |
| 55 |
r-xla/stablehlo
Create stableHLO programs in R |
|
Emerging |
| 56 |
paypal/gators
Gators is a package to handle model building with big data and fast... |
|
Emerging |
| 57 |
adalkiran/distributed-inference
A project to demonstrate an approach to designing cross-language and... |
|
Emerging |
| 58 |
alibaba/TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed... |
|
Emerging |
| 59 |
hegongshan/Storage-for-AI-Paper
Accelerating AI Training and Inference from Storage Perspective (Must-read... |
|
Emerging |
| 60 |
eagomez2/moduleprofiler
Free open-source package to profile PyTorch models. |
|
Emerging |
| 61 |
neelsomani/kv-marketplace
Cross-GPU KV Cache Marketplace |
|
Emerging |
| 62 |
gmasse/gpu-specs
This project aims to centralize detailed specifications for GPUs,... |
|
Emerging |
| 63 |
AlibabaPAI/FlashModels
Fast and easy distributed model training examples. |
|
Emerging |
| 64 |
lsds/Crossbow
Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes |
|
Emerging |
| 65 |
CEA-LIST/RPCDataloader
A variant of the PyTorch Dataloader using remote workers. |
|
Emerging |
| 66 |
gsyang33/Driple
🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems |
|
Emerging |
| 67 |
NERSC/dl-at-scale-training
Deep Learning at Scale Training Event at NERSC |
|
Emerging |
| 68 |
NERSC/sc25-dl-tutorial
Deep Learning at Scale @ SC25 |
|
Emerging |
| 69 |
Youhe-Jiang/IJCAI2023-OptimalShardedDataParallel
[IJCAI2023] An automated parallel training system that combines the... |
|
Emerging |
| 70 |
earthai-tech/gofast
gofast: AIO machine learning package |
|
Emerging |
| 71 |
ANRGUSC/ML_onChain
A python-solidity translator that generates on-chain neural networks |
|
Emerging |
| 72 |
google/iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on... |
|
Emerging |
| 73 |
astariul/gibbs
Scale your ML workers asynchronously across processes and machines |
|
Emerging |
| 74 |
NERSC/dl4sci25-dl-at-scale
Deep learning for science school material 2025 |
|
Emerging |
| 75 |
PLCnext/MLnext-Framework
MLnext Framework is an open source framework for hardware independent... |
|
Emerging |
| 76 |
qhliu26/Dive-into-Big-Model-Training
📑 Dive into Big Model Training |
|
Emerging |
| 77 |
lt-asset/D3
"D3: Differential Testing of Distributed Deep Learning with Model... |
|
Experimental |
| 78 |
rasbt/b3-basic-batchsize-benchmark
Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As... |
|
Experimental |
| 79 |
siboehm/ShallowSpeed
Small scale distributed training of sequential deep learning models, built... |
|
Experimental |
| 80 |
cake-lab/DELI
Optimizing loading training data from cloud bucket storage for cloud-based... |
|
Experimental |
| 81 |
alvarobartt/ml-monitoring-with-wandb
:detective::robot: Monitoring a PyTorch Lightning CNN with Weights & Biases |
|
Experimental |
| 82 |
marcos-venicius/smlf
A small machine learning framework with ONLY python and math |
|
Experimental |
| 83 |
yanisZirem/prism-profiler
profiler desktop versions |
|
Experimental |
| 84 |
Kushalk0677/Priority-Aware-Adaptive-Scheduling-for-Multi-Model-Edge-AI-Systems
Priority-Aware Edge Scheduler (PAES) for concurrent multi-model AI inference... |
|
Experimental |
| 85 |
0xNaN/edufsdp
A minimal, educational implementation of Fully Sharded Data Parallel (FSDP). |
|
Experimental |
| 86 |
ashishpatel26/Rapidsai_Machine_learning_on_GPU
Rapidsai_Machine_learnring_on_GPU |
|
Experimental |
| 87 |
mrtan-ys/RoleML
Role-oriented programming model for distributed ML |
|
Experimental |
| 88 |
Continuum-Intelligence/continuum-hydra
Performance-first ML systems toolkit for environment diagnostics and... |
|
Experimental |
| 89 |
AbdelStark/nostrain
Coordinator-free distributed ML training over Nostr relays. |
|
Experimental |
| 90 |
poojakira/Predictive-GPU-Memory-Defragmenter
A production-grade Transformer-driven system that predicts GPU memory... |
|
Experimental |
| 91 |
Szhuaa/PyFlightProfiler
🌟 Boost Python application performance with PyFlightProfiler, a toolbox for... |
|
Experimental |
| 92 |
rogue-agent1/markov-chain-py
Markov chain simulation with stationary distribution |
|
Experimental |
| 93 |
rogue-agent1/toml2json
toml2json - Convert between TOML and JSON. |
|
Experimental |
| 94 |
Arakiss/hecate-os
Linux distro with automatic hardware detection and per-system optimization.... |
|
Experimental |
| 95 |
alpha-one-index/ai-infra-index
Comprehensive technical reference for AI hardware: GPUs, TPUs, inference... |
|
Experimental |
| 96 |
JeffWigger/FastDynamicBatcher
FastDynamicBatcher is a library for batching inputs across requests to... |
|
Experimental |
| 97 |
rogue-agent1/yamltoml
Convert between JSON, YAML, and TOML formats. |
|
Experimental |
| 98 |
DaveAldon/Distributed-ML-with-MLX
🍎👉🍏 Everything you need in order to get started building distributed machine... |
|
Experimental |
| 99 |
Dev-next-gen/Bittensor-rocm
ROCm-compatible fork of Bittensor – Full PyTorch 2.4 ROCm support – Wallet,... |
|
Experimental |
| 100 |
chirasin99/hecate-os
⚙️ Optimize your Linux experience with HecateOS, a performance-driven... |
|
Experimental |
| 101 |
JagjeevanAK/CruxML
(Under-Development) A minified Machine Learning and Deep learning Framework/Library. |
|
Experimental |
| 102 |
dlzou/computron
Serving distributed deep learning models with model parallel swapping. |
|
Experimental |
| 103 |
michael-borck/loco-convoy
Documentation and experiments for running AI inference workloads across multiple GPUs |
|
Experimental |
| 104 |
explcre/pipeDejavu
pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for... |
|
Experimental |
| 105 |
GUT-AI/memory-bottleneck
Memory Bottleneck of Deep Learning models |
|
Experimental |
| 106 |
explcre/SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf
SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU,... |
|
Experimental |
| 107 |
Kritim708/multi-gpu-deep-learning-nvidia-workshop
This repository contains a project I created as part of the NVIDIA workshop... |
|
Experimental |
| 108 |
gdf-ai/gdf
Open-source community GPU network for distributed AI model training |
|
Experimental |
| 109 |
Prelf1992/distributed-ml-training-system
A proof-of-concept for a distributed machine learning training system,... |
|
Experimental |
| 110 |
Pects1949/Python-Distributed-ML-Framework
A Python framework for distributed machine learning training, leveraging... |
|
Experimental |
| 111 |
Gaius-del/python_hpc_2025
🚀 Accelerate scientific applications in supercomputing with Python using... |
|
Experimental |
| 112 |
ArslanKamchybekov/raydar
Raydar is the smart lost and found platform designed specifically for UIC... |
|
Experimental |
| 113 |
Jason-Wang313/OmniTrace
A full-stack GPU profiling and simulation framework that bridges high-level... |
|
Experimental |
| 114 |
dsrhaslab/prisma
A data prefetching storage data plane for accelerating DL training performance. |
|
Experimental |
| 115 |
olehxch/mlx-neural-networks
💻 Explore the art and science of neural networks through hands-on examples... |
|
Experimental |
| 116 |
LiYanan2004/MLXPlayground
The basis of mlx for beginners like me. You can try out mlx code and check... |
|
Experimental |
| 117 |
Mamiglia/mergecraft
Mergecraft is a simple library to streamline model merging operations, with... |
|
Experimental |
| 118 |
abhisheks-gh/Veritas_Predictive-Caching-for-File-Systems
Developed for Veritas Technologies LLC, this project optimizes DB workloads... |
|
Experimental |
| 119 |
shivangraval50/distributed-ml-training
Distributed ML training platform achieving 10.6× speedup | PyTorch DDP |... |
|
Experimental |
| 120 |
ajulyav/DL-multiple-GPU
Some important main concepts on training DL models on multiple GPUs |
|
Experimental |
| 121 |
kirillsaidov/prisma
A tiny deep learning library aimed at ease of use and usability. |
|
Experimental |