NVIDIA/nccl

Optimized primitives for collective multi-GPU communication

64
/ 100
Established

This library helps high-performance computing developers efficiently move data between multiple GPUs. It enables faster training of large models or processing of big datasets by optimizing communication routines like all-reduce and broadcast. The input is data distributed across several GPUs, and the output is the same data aggregated or redistributed efficiently, making it ideal for engineers building deep learning frameworks or scientific simulation software.

4,521 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you are a system architect or developer building applications that require fast, collective data transfers between multiple GPUs, either within a single server or across a cluster.

Not ideal if you are a data scientist primarily using high-level deep learning frameworks without needing to optimize low-level GPU communication primitives.

GPU-computing deep-learning-infrastructure high-performance-computing parallel-computing distributed-systems-development
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

4,521

Forks

1,158

Language

C++

License

Last pushed

Mar 08, 2026

Commits (30d)

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/NVIDIA/nccl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.