openmlsys/openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

/ 100

Emerging

This project provides practical tutorials and examples for engineers aiming to optimize the performance of AI model operations on NVIDIA GPUs. It demonstrates how to write high-performance GPU code, taking basic operator implementations and applying advanced optimization techniques like shared memory usage and pipeline rearrangement. The target audience is AI/ML engineers and researchers who develop and deploy machine learning models and need to accelerate their computational graphs.

134 stars. No commits in the last 6 months.

Use this if you are an AI/ML engineer or researcher working with NVIDIA GPUs and need to understand or implement highly optimized custom operators for your models.

Not ideal if you are a data scientist or user who primarily uses existing AI frameworks and libraries without needing to dive into low-level GPU programming.

GPU-acceleration AI-model-optimization deep-learning-inference machine-learning-engineering CUDA-programming

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

134

Forks

Language

Cuda

License

—

Higher-rated alternatives

brucefan1983/GPUMD

Graphics Processing Units Molecular Dynamics

iree-org/iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

uxlfoundation/oneDAL

oneAPI Data Analytics Library (oneDAL)

rapidsai/cuml

cuML - RAPIDS Machine Learning Library

NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

Explore ML Frameworks

All categories Trending ML Framework directory Insights