NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

67
/ 100
Established

This project provides specialized tools for developers to create highly optimized linear algebra operations, particularly for matrix-matrix multiplication (GEMM), on NVIDIA GPUs. It takes in computational definitions and data types, and outputs high-performance CUDA kernels. Researchers, performance engineers, and students working on GPU programming for numerical applications would find this useful.

9,426 stars. Actively maintained with 10 commits in the last 30 days.

Use this if you need to develop custom, extremely fast GPU kernels for linear algebra, especially matrix multiplications, using a more accessible Python interface or traditional C++ templates.

Not ideal if you are an end-user simply looking to run existing machine learning models or use standard data science libraries without writing custom GPU code.

GPU programming High-performance computing Numerical optimization Deep learning infrastructure CUDA kernel development
No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 24 / 25

How are scores calculated?

Stars

9,426

Forks

1,725

Language

C++

License

Last pushed

Mar 12, 2026

Commits (30d)

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/NVIDIA/cutlass"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.