deepreinforce-ai/CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

42
/ 100
Emerging

This system helps AI/ML practitioners speed up the foundational operation of matrix multiplication on NVIDIA GPUs. It takes your existing half-precision matrix multiplication workloads and processes them using custom-optimized code, delivering significantly faster results than standard libraries. This is designed for AI engineers, machine learning scientists, and researchers who are running large language models or other compute-intensive AI applications.

472 stars.

Use this if you are developing or deploying AI models, especially large language models, and need to accelerate half-precision matrix multiplication performance on NVIDIA A100, RTX 3090, or H100 GPUs.

Not ideal if your workload does not involve half-precision matrix multiplication or if you are using a GPU type not specifically supported (like older or non-NVIDIA GPUs).

AI/ML operations GPU optimization Large Language Models AI model deployment High-performance computing
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 13 / 25
Community 13 / 25

How are scores calculated?

Stars

472

Forks

25

Language

Cuda

License

MIT

Last pushed

Jan 08, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/deepreinforce-ai/CUDA-L2"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.