dtunai/Tri-RMSNorm
Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.
This package helps deep learning engineers accelerate the training and inference of neural networks by providing a highly optimized Root Mean Square (RMS) layer normalization kernel. It takes PyTorch tensors as input and outputs normalized tensors with significantly faster computations for both forward and backward passes. This is ideal for developers building and training large-scale deep learning models.
No commits in the last 6 months.
Use this if you are a deep learning engineer or researcher looking to speed up the RMS normalization steps in your PyTorch models, especially on GPU.
Not ideal if you are not working with deep learning models, PyTorch, or GPU acceleration, or if you need a solution for CPUs.
Stars
12
Forks
2
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 05, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/dtunai/Tri-RMSNorm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
gpu-mode/Triton-Puzzles
Puzzles for learning Triton
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档