Triton Inference Deployment ML Frameworks

Tools, frameworks, and guides for deploying machine learning models using NVIDIA Triton Inference Server, including optimization, benchmarking, and integration patterns. Does NOT include general inference serving, model training, or Triton kernel programming (see mojo-ml-frameworks for low-level GPU kernel work).

There are 42 triton inference deployment frameworks tracked. 8 score above 50 (established tier). The highest-rated is triton-inference-server/server at 66/100 with 10,426 stars. 4 of the top 10 are actively maintained.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=triton-inference-deployment&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	triton-inference-server/server The Triton Inference Server provides an optimized cloud and edge inferencing...	66	Established	10,426	Python
2	gpu-mode/Triton-Puzzles Puzzles for learning Triton	65	Established	2,338	Jupyter Notebook
3	hailo-ai/hailo_model_zoo The Hailo Model Zoo includes pre-trained models and a full building and...	63	Established	613	Python
4	open-mmlab/mmdeploy OpenMMLab Model Deployment Framework	61	Established	3,107	Python
5	hyperai/tvm-cn TVM Documentation in Chinese Simplified / TVM 中文文档	61	Established	3,501	TypeScript
6	triton-inference-server/model_analyzer Triton Model Analyzer is a CLI tool to help with better understanding of the...	60	Established	507	Python
7	ot-triton-lab/flash-sinkhorn FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton....	56	Established	183	Python
8	triton-inference-server/model_navigator Triton Model Navigator is an inference toolkit designed for optimizing and...	52	Established	218	Python
9	LukasHedegaard/pytorch-benchmark Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu...	47	Emerging	109	Python
10	srush/Tensor-Puzzles Solve puzzles. Improve your pytorch.	46	Emerging	3,976	Jupyter Notebook
11	hyperai/triton-cn Triton Documentation in Chinese Simplified / Triton 中文文档	46	Emerging	105	TypeScript
12	srush/Triton-Puzzles Puzzles for learning Triton	45	Emerging	2,332	Jupyter Notebook
13	suvojit-0x55aa/mixed-precision-pytorch Training with FP16 weights in PyTorch	44	Emerging	81	Python
14	triton-inference-server/pytriton PyTriton is a Flask/FastAPI-like interface that simplifies Triton's...	43	Emerging	835	Python
15	sachinsharma9780/Build-ML-pipelines-for-Computer-Vision-NLP-and-Graph-Neural-Networks-using-Nvidia-Triton-Server Build ML pipelines for Computer Vision, NLP and Graph Neural Networks using...	41	Emerging	42	Jupyter Notebook
16	BobMcDear/attorch A subset of PyTorch's neural network modules, written in Python using...	41	Emerging	597	Python
17	philipturner/metal-flash-attention FlashAttention (Metal Port)	40	Emerging	589	Swift
18	alexzhang13/flashattention2-custom-mask Triton implementation of FlashAttention2 that adds Custom Masks.	39	Emerging	170	Python
19	tnbar/tednet TedNet: A Pytorch Toolkit for Tensor Decomposition Networks	39	Emerging	96	Python
20	kakaobrain/trident A performance library for machine learning applications.	38	Emerging	183	Python
21	anujinho/trident Official repository for the paper TRIDENT: Transductive Decoupled...	37	Emerging	40	Python
22	ai-dynamo/aitune NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep...	35	Emerging	8	Python
23	dtunai/Tri-RMSNorm Efficient kernel for RMS normalization with fused operations, includes both...	32	Emerging	12	Python
24	fversaci/cassandra-dali-plugin Cassandra plugin for NVIDIA DALI	32	Emerging	1	C++
25	daemyung/practice-triton 삼각형의 실전! Triton	31	Emerging	16	Python
26	jayeshmahapatra/triton-fastapi-docker A repository demonstrating deploying ML models using Triton + FastAPI + Docker	30	Emerging	6	Jupyter Notebook
27	MaxLSB/flash-attn2 FlashAttention for sliding window attention in Triton (fwd + bwd pass)	30	Emerging	11	Python
28	ZrobMiloudaa/jetson-orin-matmul-analysis 🔍 Analyze CUDA matrix multiplication performance and power consumption on...	30	Emerging	1	Python
29	hiennguyen9874/triton-face-recognition Triton face detection & recognition	28	Experimental	8	Jupyter Notebook
30	indri-voice/vit.triton VIT inference in triton because, why not?	23	Experimental	36	Python
31	niyazed/triton-mnist-example MNIST inference example on NVIDIA Triton Inference Server	23	Experimental	4	PureBasic
32	Anggipratama17/triton-accelerated-attention 🚀 Implement Triton GPU kernels for multi-head self-attention, enabling...	22	Experimental	—	Python
33	jrajath94/triton-inference-kernels Fused softmax + Flash Attention in OpenAI Triton — 50x VRAM reduction at seq_len=2048	22	Experimental	—	Python
34	Cre4T3Tiv3/jetson-orin-matmul-analysis Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x...	22	Experimental	14	Python
35	angelolamonaca/PyTorch-Precision-Converter A flexible utility for converting tensor precision in PyTorch models and...	21	Experimental	11	Python
36	lengstrom/flashback A FlashAttention backwards-over-backwards ⚡🔙🔙	20	Experimental	10	Jupyter Notebook
37	dbrll/ATTN-11 Paper Tape is All You Need	14	Experimental	—	Fortran
38	Achiwilms/NVIDIA-Triton-Deployment-Quickstart QuickStart for Deploying a Basic Model on the Triton Inference Server	13	Experimental	—	Python
39	palapav/triton-compute-kernels A collection of Triton compute kernels for common ML operations	13	Experimental	—	—
40	LessUp/cuflash-attn Pure CUDA C++ FlashAttention Forward/Backward Pass with Causal Masking &...	13	Experimental	—	Cuda
41	kalyani-25/Reimplementation_flash-attention-from-scratch 16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official...	13	Experimental	—	Cuda
42	JonSnow1807/Fused-LayerNorm-CUDA-Operator High-performance CUDA implementation of LayerNorm for PyTorch achieving...	11	Experimental	—	Python

Comparisons in this category

Triton-Puzzles and Tensor-Puzzles (65 vs 46) tvm-cn and triton-cn (61 vs 46) model_analyzer and model_navigator (60 vs 52)