microsoft/nnscaler
nnScaler: Compiling DNN models for Parallel Training
When training large Deep Neural Network (DNN) models, you often run into performance bottlenecks on a single GPU. This tool takes your existing DNN model, designed for a single GPU, and compiles it into a version that efficiently trains across multiple GPUs. This means faster training times for DNN scientists and machine learning engineers.
125 stars. No commits in the last 6 months.
Use this if you need to train large deep neural network models more quickly by leveraging multiple GPUs, without having to manually re-architect your model for parallel execution.
Not ideal if your models are small enough to train efficiently on a single GPU or if you require fine-grained, custom control over every aspect of parallelization.
Stars
125
Forks
22
Language
Python
License
MIT
Category
Last pushed
Sep 23, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/microsoft/nnscaler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
d9d-project/d9d
d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient...
Scottcjn/exo-cuda
Exo distributed inference with NVIDIA CUDA support via tinygrad
nirw4nna/dsc
Tensor library & inference framework for machine learning
Zzzxkxz/cuda-fp8-ampere
🚀 Accelerate FP8 GEMM tasks on RTX 3090 Ti using lightweight storage and efficient tensor cores...
Wasisange/cuda-kernels-collection
Custom CUDA kernels for optimized tensor operations in deep learning.