triton-inference-server/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
This tool helps machine learning engineers and MLOps professionals optimize how their AI models run on NVIDIA's Triton Inference Server. It takes your model files and hardware specifications to generate configurations that balance performance, latency, and resource usage. The output includes detailed reports showing the trade-offs of different settings, helping you choose the best setup for your production environment.
507 stars. Actively maintained with 4 commits in the last 30 days.
Use this if you need to fine-tune the deployment of your AI models on Triton Inference Server to meet specific performance or resource efficiency targets.
Not ideal if you are looking for a tool to train or develop your AI models, as this focuses solely on optimizing their deployment and inference.
Stars
507
Forks
85
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 10, 2026
Commits (30d)
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/triton-inference-server/model_analyzer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
gpu-mode/Triton-Puzzles
Puzzles for learning Triton
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档