Venkat2811/yali

Speed-of-Light SW efficiency by using ultra low-latency primitives for comms collectives

28
/ 100
Experimental

This project offers an optimized library for speeding up data exchange between two NVIDIA GPUs connected by NVLink. It takes in arrays of numerical data on two GPUs and efficiently combines them, then broadcasts the result back to both GPUs. High-performance computing engineers or researchers working with GPU-accelerated workloads will find this useful for reducing the time spent on collective communication operations.

Use this if you are running computationally intensive tasks that involve frequent data aggregation (like "AllReduce" operations) between exactly two NVLink-connected NVIDIA GPUs and you need faster communication with more consistent performance than standard libraries.

Not ideal if your setup involves more than two GPUs, if your GPUs are not connected via NVLink, or if you need to perform collective operations across multiple compute nodes.

GPU-accelerated computing High-performance computing Parallel processing Deep learning infrastructure Scientific simulation
No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 13 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

Cuda

License

MIT

Last pushed

Jan 22, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Venkat2811/yali"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.