d9d-project/d9d

d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient yet hackable

/ 100

Established

This framework helps machine learning researchers and engineers efficiently train very large deep learning models across multiple GPUs or machines. You provide your PyTorch model and data, and it manages the complex setup for distributed training, allowing you to get a trained model faster. It's designed for those who need to experiment with novel training approaches without being limited by rigid, predefined systems.

Available on PyPI.

Use this if you are a deep learning researcher or ML engineer building and training custom large-scale models in PyTorch and need a flexible, performant way to distribute your training across multiple devices.

Not ideal if you need a simple command-line tool for training standard, pre-defined models without much customization, or if you are working with older PyTorch versions or hardware.

deep-learning-research large-model-training neural-network-scaling ML-experimentation distributed-computing-ML

Maintenance 13 / 25

Adoption 5 / 25

Maturity 22 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Related frameworks

microsoft/nnscaler

nnScaler: Compiling DNN models for Parallel Training

Scottcjn/exo-cuda

Exo distributed inference with NVIDIA CUDA support via tinygrad

nirw4nna/dsc

Tensor library & inference framework for machine learning

Zzzxkxz/cuda-fp8-ampere

🚀 Accelerate FP8 GEMM tasks on RTX 3090 Ti using lightweight storage and efficient tensor cores...

Wasisange/cuda-kernels-collection

Custom CUDA kernels for optimized tensor operations in deep learning.

Explore ML Frameworks

All categories Trending ML Framework directory Insights