uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

68
/ 100
Established

This project dramatically speeds up the communication between Graphics Processing Units (GPUs) when running demanding AI and machine learning tasks. It takes your existing machine learning code, which uses standard communication methods like NCCL or RCCL, and processes data much faster between GPUs, delivering significantly quicker training times. Data scientists, machine learning engineers, and AI researchers working with large-scale GPU clusters will find this useful for accelerating model training and distributed computations.

1,234 stars. Actively maintained with 58 commits in the last 30 days.

Use this if you are running large-scale distributed machine learning workloads on multiple GPUs and need to drastically improve data transfer speed and overall training efficiency.

Not ideal if your machine learning tasks run on a single GPU or if you are not experiencing communication bottlenecks across your GPU cluster.

distributed-machine-learning GPU-acceleration deep-learning-training AI-infrastructure high-performance-computing
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

1,234

Forks

128

Language

C++

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

58

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/uccl-project/uccl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.