Cre4T3Tiv3/jetson-orin-matmul-analysis
Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x 5 matrix sizes on Jetson Orin Nano. 1,282 GFLOPS peak, 90% performance @ 88% power (25W mode), 99.5% accuracy validation, edge AI deployment guide.
This framework helps embedded systems engineers and AI developers understand the real-world performance of matrix multiplication on NVIDIA Jetson Orin Nano devices. It takes four different CUDA implementations of matrix multiplication and runs them across various power modes and matrix sizes. The output is a detailed benchmark report and visualizations showing performance, power efficiency, and accuracy, helping users optimize their edge AI applications.
No commits in the last 6 months.
Use this if you need to choose the best matrix multiplication implementation and power configuration for your neural network or linear algebra workloads on a Jetson Orin Nano.
Not ideal if you are looking to benchmark general-purpose computing tasks or optimize for GPUs other than the Jetson Orin Nano.
Stars
14
Forks
—
Language
Python
License
MIT
Category
Last pushed
Oct 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Cre4T3Tiv3/jetson-orin-matmul-analysis"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
gpu-mode/Triton-Puzzles
Puzzles for learning Triton
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档