Distributed Training Frameworks

Frameworks and libraries for distributed training of machine learning models across multiple GPUs, nodes, or devices using data parallelism, model parallelism, or hybrid approaches. Does NOT include single-machine training optimization, inference frameworks, or educational tutorials on distributed concepts without working implementations.

There are 121 distributed training frameworks tracked. 2 score above 70 (verified tier). The highest-rated is deepspeedai/DeepSpeed at 81/100 with 41,801 stars. 2 of the top 10 are actively maintained.

Get all 121 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=distributed-training-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed...

81
Verified
2 helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI...

70
Verified
3 hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

68
Established
4 horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

67
Established
5 bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs...

63
Established
6 learning-at-home/hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands...

62
Established
7 xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

61
Established
8 google/sedpack

Sedpack - Scalable and efficient data packing

59
Established
9 HazyResearch/fonduer

A knowledge base construction engine for richly formatted data

57
Established
10 btursunbayev/nvsonar

Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes

56
Established
11 cylondata/cylon

Cylon is a fast, scalable, distributed memory, parallel runtime with a...

56
Established
12 kakaobrain/torchgpipe

A GPipe implementation in PyTorch

55
Established
13 spotify/pythonflow

:snake: Dataflow programming for python.

55
Established
14 fastai/fastgpu

A queue service for quickly developing scripts that use all your GPUs efficiently

54
Established
15 BaguaSys/bagua

Bagua Speeds up PyTorch

53
Established
16 cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras...

51
Established
17 TGSAI/mdio-python

Cloud native, scalable storage engine for various types of energy data.

51
Established
18 IBM/FfDL

Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning...

51
Established
19 maxpumperla/elephas

Distributed Deep learning with Keras & Spark

51
Established
20 Mitchell-Mirano/sorix

Sorix, high performance, easy to learn, fast to code, from prototype to production

50
Established
21 NimbleBoxAI/nbox

The official python package for NimbleBox. Exposes all APIs as CLIs and...

50
Established
22 h2oai/h2o4gpu

H2Oai GPU Edition

50
Established
23 PanJinquan/Pytorch-Base-Trainer

Pytorch分布式训练框架

49
Emerging
24 sehoffmann/dmlcloud

Painless distributed training with torch

49
Emerging
25 saforem2/ezpz

Train across all your devices, ezpz 🍋

49
Emerging
26 aksnzhy/xlearn

High performance, easy-to-use, and scalable machine learning (ML) package,...

49
Emerging
27 PaddlePaddle/PaddleCloud

PaddlePaddle Docker images and K8s operators for PaddleOCR/Detection...

49
Emerging
28 Hsword/Hetu

A high-performance distributed deep learning system targeting large-scale...

48
Emerging
29 bytedance/byteps

A high performance and generic framework for distributed DNN training

48
Emerging
30 mars-project/mars

Mars is a tensor-based unified framework for large-scale data computation...

47
Emerging
31 alibaba/EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning...

47
Emerging
32 determined-ai/determined

Determined is an open-source machine learning platform that simplifies...

47
Emerging
33 nf-core/deepmodeloptim

Stochastic Testing and Input Manipulation for Unbiased Learning Systems

47
Emerging
34 lynxkite/lynxkite

The complete graph data science platform

47
Emerging
35 Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

47
Emerging
36 firmai/pandapy

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster...

46
Emerging
37 array2d/deepx

Large-scale Auto-Distributed Training/Inference Unified Framework |...

45
Emerging
38 uber/fiber

Distributed Computing for AI Made Simple

45
Emerging
39 williamFalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter...

44
Emerging
40 unslothai/hyperlearn

2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

44
Emerging
41 BBEK-Anand/PyTorchLabFlow

To manage PyTorch experiments with ease, analyse all components of training pipeline.

44
Emerging
42 IntelPython/sdc

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler

43
Emerging
43 allenai/tango

Organize your experiments into discrete steps that can be cached and reused...

43
Emerging
44 flow2ml/Flow2ML

An Open Source Library to make Machine Learning process much Simpler

43
Emerging
45 Asthestarsfalll/ExCore

A Modern Configuration/Registry System designed for deeplearning, with some utils.

42
Emerging
46 geoffxy/habitat

🔮 Execution time predictions for deep neural network training iterations...

42
Emerging
47 lucasbrianpiveta/Hetu-DiT

🚀 Optimize your Diffusion Transformers with Hetu-DiT, a dynamic parallel...

41
Emerging
48 ravenprotocol/ravnest

Decentralized Asynchronous Training on Heterogeneous Devices

40
Emerging
49 rkhan055/SHADE

SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

40
Emerging
50 deepfinch/XLearning-GPU

qihoo360 xlearning with GPU support; AI on Hadoop

39
Emerging
51 openclimatefix/ocf_datapipes

OCF's DataPipe based dataloader for training and inference

39
Emerging
52 rentainhe/pytorch-distributed-training

Simple tutorials on Pytorch DDP training

39
Emerging
53 hora-search/horapy

🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library

38
Emerging
54 hkproj/pytorch-transformer-distributed

Distributed training (multi-node) of a Transformer model

38
Emerging
55 r-xla/stablehlo

Create stableHLO programs in R

38
Emerging
56 paypal/gators

Gators is a package to handle model building with big data and fast...

37
Emerging
57 adalkiran/distributed-inference

A project to demonstrate an approach to designing cross-language and...

37
Emerging
58 alibaba/TePDist

TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed...

37
Emerging
59 hegongshan/Storage-for-AI-Paper

Accelerating AI Training and Inference from Storage Perspective (Must-read...

37
Emerging
60 eagomez2/moduleprofiler

Free open-source package to profile PyTorch models.

36
Emerging
61 neelsomani/kv-marketplace

Cross-GPU KV Cache Marketplace

36
Emerging
62 gmasse/gpu-specs

This project aims to centralize detailed specifications for GPUs,...

36
Emerging
63 AlibabaPAI/FlashModels

Fast and easy distributed model training examples.

36
Emerging
64 lsds/Crossbow

Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes

36
Emerging
65 CEA-LIST/RPCDataloader

A variant of the PyTorch Dataloader using remote workers.

35
Emerging
66 gsyang33/Driple

🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems

35
Emerging
67 NERSC/dl-at-scale-training

Deep Learning at Scale Training Event at NERSC

34
Emerging
68 NERSC/sc25-dl-tutorial

Deep Learning at Scale @ SC25

34
Emerging
69 Youhe-Jiang/IJCAI2023-OptimalShardedDataParallel

[IJCAI2023] An automated parallel training system that combines the...

34
Emerging
70 earthai-tech/gofast

gofast: AIO machine learning package

33
Emerging
71 ANRGUSC/ML_onChain

A python-solidity translator that generates on-chain neural networks

33
Emerging
72 google/iopddl

Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on...

32
Emerging
73 astariul/gibbs

Scale your ML workers asynchronously across processes and machines

30
Emerging
74 NERSC/dl4sci25-dl-at-scale

Deep learning for science school material 2025

30
Emerging
75 PLCnext/MLnext-Framework

MLnext Framework is an open source framework for hardware independent...

30
Emerging
76 qhliu26/Dive-into-Big-Model-Training

📑 Dive into Big Model Training

30
Emerging
77 lt-asset/D3

"D3: Differential Testing of Distributed Deep Learning with Model...

29
Experimental
78 rasbt/b3-basic-batchsize-benchmark

Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As...

29
Experimental
79 siboehm/ShallowSpeed

Small scale distributed training of sequential deep learning models, built...

28
Experimental
80 cake-lab/DELI

Optimizing loading training data from cloud bucket storage for cloud-based...

28
Experimental
81 alvarobartt/ml-monitoring-with-wandb

:detective::robot: Monitoring a PyTorch Lightning CNN with Weights & Biases

27
Experimental
82 marcos-venicius/smlf

A small machine learning framework with ONLY python and math

27
Experimental
83 yanisZirem/prism-profiler

profiler desktop versions

27
Experimental
84 Kushalk0677/Priority-Aware-Adaptive-Scheduling-for-Multi-Model-Edge-AI-Systems

Priority-Aware Edge Scheduler (PAES) for concurrent multi-model AI inference...

26
Experimental
85 0xNaN/edufsdp

A minimal, educational implementation of Fully Sharded Data Parallel (FSDP).

26
Experimental
86 ashishpatel26/Rapidsai_Machine_learning_on_GPU

Rapidsai_Machine_learnring_on_GPU

26
Experimental
87 mrtan-ys/RoleML

Role-oriented programming model for distributed ML

25
Experimental
88 Continuum-Intelligence/continuum-hydra

Performance-first ML systems toolkit for environment diagnostics and...

25
Experimental
89 AbdelStark/nostrain

Coordinator-free distributed ML training over Nostr relays.

25
Experimental
90 poojakira/Predictive-GPU-Memory-Defragmenter

A production-grade Transformer-driven system that predicts GPU memory...

22
Experimental
91 Szhuaa/PyFlightProfiler

🌟 Boost Python application performance with PyFlightProfiler, a toolbox for...

22
Experimental
92 rogue-agent1/markov-chain-py

Markov chain simulation with stationary distribution

22
Experimental
93 rogue-agent1/toml2json

toml2json - Convert between TOML and JSON.

22
Experimental
94 Arakiss/hecate-os

Linux distro with automatic hardware detection and per-system optimization....

22
Experimental
95 alpha-one-index/ai-infra-index

Comprehensive technical reference for AI hardware: GPUs, TPUs, inference...

22
Experimental
96 JeffWigger/FastDynamicBatcher

FastDynamicBatcher is a library for batching inputs across requests to...

22
Experimental
97 rogue-agent1/yamltoml

Convert between JSON, YAML, and TOML formats.

22
Experimental
98 DaveAldon/Distributed-ML-with-MLX

🍎👉🍏 Everything you need in order to get started building distributed machine...

21
Experimental
99 Dev-next-gen/Bittensor-rocm

ROCm-compatible fork of Bittensor – Full PyTorch 2.4 ROCm support – Wallet,...

21
Experimental
100 chirasin99/hecate-os

⚙️ Optimize your Linux experience with HecateOS, a performance-driven...

21
Experimental
101 JagjeevanAK/CruxML

(Under-Development) A minified Machine Learning and Deep learning Framework/Library.

20
Experimental
102 dlzou/computron

Serving distributed deep learning models with model parallel swapping.

20
Experimental
103 michael-borck/loco-convoy

Documentation and experiments for running AI inference workloads across multiple GPUs

19
Experimental
104 explcre/pipeDejavu

pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for...

19
Experimental
105 GUT-AI/memory-bottleneck

Memory Bottleneck of Deep Learning models

17
Experimental
106 explcre/SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf

SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU,...

17
Experimental
107 Kritim708/multi-gpu-deep-learning-nvidia-workshop

This repository contains a project I created as part of the NVIDIA workshop...

17
Experimental
108 gdf-ai/gdf

Open-source community GPU network for distributed AI model training

15
Experimental
109 Prelf1992/distributed-ml-training-system

A proof-of-concept for a distributed machine learning training system,...

14
Experimental
110 Pects1949/Python-Distributed-ML-Framework

A Python framework for distributed machine learning training, leveraging...

14
Experimental
111 Gaius-del/python_hpc_2025

🚀 Accelerate scientific applications in supercomputing with Python using...

14
Experimental
112 ArslanKamchybekov/raydar

Raydar is the smart lost and found platform designed specifically for UIC...

13
Experimental
113 Jason-Wang313/OmniTrace

A full-stack GPU profiling and simulation framework that bridges high-level...

13
Experimental
114 dsrhaslab/prisma

A data prefetching storage data plane for accelerating DL training performance.

12
Experimental
115 olehxch/mlx-neural-networks

💻 Explore the art and science of neural networks through hands-on examples...

12
Experimental
116 LiYanan2004/MLXPlayground

The basis of mlx for beginners like me. You can try out mlx code and check...

11
Experimental
117 Mamiglia/mergecraft

Mergecraft is a simple library to streamline model merging operations, with...

11
Experimental
118 abhisheks-gh/Veritas_Predictive-Caching-for-File-Systems

Developed for Veritas Technologies LLC, this project optimizes DB workloads...

11
Experimental
119 shivangraval50/distributed-ml-training

Distributed ML training platform achieving 10.6× speedup | PyTorch DDP |...

11
Experimental
120 ajulyav/DL-multiple-GPU

Some important main concepts on training DL models on multiple GPUs

11
Experimental
121 kirillsaidov/prisma

A tiny deep learning library aimed at ease of use and usability.

11
Experimental

Comparisons in this category