microsoft/nnscaler

nnScaler: Compiling DNN models for Parallel Training

/ 100

Emerging

When training large Deep Neural Network (DNN) models, you often run into performance bottlenecks on a single GPU. This tool takes your existing DNN model, designed for a single GPU, and compiles it into a version that efficiently trains across multiple GPUs. This means faster training times for DNN scientists and machine learning engineers.

125 stars. No commits in the last 6 months.

Use this if you need to train large deep neural network models more quickly by leveraging multiple GPUs, without having to manually re-architect your model for parallel execution.

Not ideal if your models are small enough to train efficiently on a single GPU or if you require fine-grained, custom control over every aspect of parallelization.

deep-learning-training large-language-models ai-model-scaling gpu-accelerated-training

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

125

Forks

Language

Python

License

MIT

Higher-rated alternatives

d9d-project/d9d

d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient...

Scottcjn/exo-cuda

Exo distributed inference with NVIDIA CUDA support via tinygrad

nirw4nna/dsc

Tensor library & inference framework for machine learning

Zzzxkxz/cuda-fp8-ampere

🚀 Accelerate FP8 GEMM tasks on RTX 3090 Ti using lightweight storage and efficient tensor cores...

Wasisange/cuda-kernels-collection

Custom CUDA kernels for optimized tensor operations in deep learning.

Explore ML Frameworks

All categories Trending ML Framework directory Insights