Triton Inference Deployment ML Frameworks
Tools, frameworks, and guides for deploying machine learning models using NVIDIA Triton Inference Server, including optimization, benchmarking, and integration patterns. Does NOT include general inference serving, model training, or Triton kernel programming (see mojo-ml-frameworks for low-level GPU kernel work).
There are 42 triton inference deployment frameworks tracked. 8 score above 50 (established tier). The highest-rated is triton-inference-server/server at 66/100 with 10,426 stars. 4 of the top 10 are actively maintained.
Get all 42 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=triton-inference-deployment&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing... |
|
Established |
| 2 |
gpu-mode/Triton-Puzzles
Puzzles for learning Triton |
|
Established |
| 3 |
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and... |
|
Established |
| 4 |
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework |
|
Established |
| 5 |
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档 |
|
Established |
| 6 |
triton-inference-server/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the... |
|
Established |
| 7 |
ot-triton-lab/flash-sinkhorn
FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton.... |
|
Established |
| 8 |
triton-inference-server/model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and... |
|
Established |
| 9 |
LukasHedegaard/pytorch-benchmark
Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu... |
|
Emerging |
| 10 |
srush/Tensor-Puzzles
Solve puzzles. Improve your pytorch. |
|
Emerging |
| 11 |
hyperai/triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档 |
|
Emerging |
| 12 |
srush/Triton-Puzzles
Puzzles for learning Triton |
|
Emerging |
| 13 |
suvojit-0x55aa/mixed-precision-pytorch
Training with FP16 weights in PyTorch |
|
Emerging |
| 14 |
triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's... |
|
Emerging |
| 15 |
sachinsharma9780/Build-ML-pipelines-for-Computer-Vision-NLP-and-Graph-Neural-Networks-using-Nvidia-Triton-Server
Build ML pipelines for Computer Vision, NLP and Graph Neural Networks using... |
|
Emerging |
| 16 |
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using... |
|
Emerging |
| 17 |
philipturner/metal-flash-attention
FlashAttention (Metal Port) |
|
Emerging |
| 18 |
alexzhang13/flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks. |
|
Emerging |
| 19 |
tnbar/tednet
TedNet: A Pytorch Toolkit for Tensor Decomposition Networks |
|
Emerging |
| 20 |
kakaobrain/trident
A performance library for machine learning applications. |
|
Emerging |
| 21 |
anujinho/trident
Official repository for the paper TRIDENT: Transductive Decoupled... |
|
Emerging |
| 22 |
ai-dynamo/aitune
NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep... |
|
Emerging |
| 23 |
dtunai/Tri-RMSNorm
Efficient kernel for RMS normalization with fused operations, includes both... |
|
Emerging |
| 24 |
fversaci/cassandra-dali-plugin
Cassandra plugin for NVIDIA DALI |
|
Emerging |
| 25 |
daemyung/practice-triton
삼각형의 실전! Triton |
|
Emerging |
| 26 |
jayeshmahapatra/triton-fastapi-docker
A repository demonstrating deploying ML models using Triton + FastAPI + Docker |
|
Emerging |
| 27 |
MaxLSB/flash-attn2
FlashAttention for sliding window attention in Triton (fwd + bwd pass) |
|
Emerging |
| 28 |
ZrobMiloudaa/jetson-orin-matmul-analysis
🔍 Analyze CUDA matrix multiplication performance and power consumption on... |
|
Emerging |
| 29 |
hiennguyen9874/triton-face-recognition
Triton face detection & recognition |
|
Experimental |
| 30 |
indri-voice/vit.triton
VIT inference in triton because, why not? |
|
Experimental |
| 31 |
niyazed/triton-mnist-example
MNIST inference example on NVIDIA Triton Inference Server |
|
Experimental |
| 32 |
Anggipratama17/triton-accelerated-attention
🚀 Implement Triton GPU kernels for multi-head self-attention, enabling... |
|
Experimental |
| 33 |
jrajath94/triton-inference-kernels
Fused softmax + Flash Attention in OpenAI Triton — 50x VRAM reduction at seq_len=2048 |
|
Experimental |
| 34 |
Cre4T3Tiv3/jetson-orin-matmul-analysis
Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x... |
|
Experimental |
| 35 |
angelolamonaca/PyTorch-Precision-Converter
A flexible utility for converting tensor precision in PyTorch models and... |
|
Experimental |
| 36 |
lengstrom/flashback
A FlashAttention backwards-over-backwards ⚡🔙🔙 |
|
Experimental |
| 37 |
dbrll/ATTN-11
Paper Tape is All You Need |
|
Experimental |
| 38 |
Achiwilms/NVIDIA-Triton-Deployment-Quickstart
QuickStart for Deploying a Basic Model on the Triton Inference Server |
|
Experimental |
| 39 |
palapav/triton-compute-kernels
A collection of Triton compute kernels for common ML operations |
|
Experimental |
| 40 |
LessUp/cuflash-attn
Pure CUDA C++ FlashAttention Forward/Backward Pass with Causal Masking &... |
|
Experimental |
| 41 |
kalyani-25/Reimplementation_flash-attention-from-scratch
16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official... |
|
Experimental |
| 42 |
JonSnow1807/Fused-LayerNorm-CUDA-Operator
High-performance CUDA implementation of LayerNorm for PyTorch achieving... |
|
Experimental |