NVIDIA/nccl

Optimized primitives for collective multi-GPU communication

/ 100

Established

This library helps high-performance computing developers efficiently move data between multiple GPUs. It enables faster training of large models or processing of big datasets by optimizing communication routines like all-reduce and broadcast. The input is data distributed across several GPUs, and the output is the same data aggregated or redistributed efficiently, making it ideal for engineers building deep learning frameworks or scientific simulation software.

4,521 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you are a system architect or developer building applications that require fast, collective data transfers between multiple GPUs, either within a single server or across a cluster.

Not ideal if you are a data scientist primarily using high-level deep learning frameworks without needing to optimize low-level GPU communication primitives.

GPU-computing deep-learning-infrastructure high-performance-computing parallel-computing distributed-systems-development

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

4,521

Forks

1,158

Language

C++

License

—

Related frameworks

iree-org/iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

brucefan1983/GPUMD

Graphics Processing Units Molecular Dynamics

uxlfoundation/oneDAL

oneAPI Data Analytics Library (oneDAL)

rapidsai/cuml

cuML - RAPIDS Machine Learning Library

NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

Explore ML Frameworks

All categories Trending ML Framework directory Insights