openmlsys/openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
This project provides practical tutorials and examples for engineers aiming to optimize the performance of AI model operations on NVIDIA GPUs. It demonstrates how to write high-performance GPU code, taking basic operator implementations and applying advanced optimization techniques like shared memory usage and pipeline rearrangement. The target audience is AI/ML engineers and researchers who develop and deploy machine learning models and need to accelerate their computational graphs.
134 stars. No commits in the last 6 months.
Use this if you are an AI/ML engineer or researcher working with NVIDIA GPUs and need to understand or implement highly optimized custom operators for your models.
Not ideal if you are a data scientist or user who primarily uses existing AI frameworks and libraries without needing to dive into low-level GPU programming.
Stars
134
Forks
15
Language
Cuda
License
—
Category
Last pushed
Aug 12, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/openmlsys/openmlsys-cuda"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
brucefan1983/GPUMD
Graphics Processing Units Molecular Dynamics
iree-org/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
uxlfoundation/oneDAL
oneAPI Data Analytics Library (oneDAL)
rapidsai/cuml
cuML - RAPIDS Machine Learning Library
NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra