Cre4T3Tiv3/jetson-orin-matmul-analysis

Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x 5 matrix sizes on Jetson Orin Nano. 1,282 GFLOPS peak, 90% performance @ 88% power (25W mode), 99.5% accuracy validation, edge AI deployment guide.

/ 100

Experimental

This framework helps embedded systems engineers and AI developers understand the real-world performance of matrix multiplication on NVIDIA Jetson Orin Nano devices. It takes four different CUDA implementations of matrix multiplication and runs them across various power modes and matrix sizes. The output is a detailed benchmark report and visualizations showing performance, power efficiency, and accuracy, helping users optimize their edge AI applications.

No commits in the last 6 months.

Use this if you need to choose the best matrix multiplication implementation and power configuration for your neural network or linear algebra workloads on a Jetson Orin Nano.

Not ideal if you are looking to benchmark general-purpose computing tasks or optimize for GPUs other than the Jetson Orin Nano.

edge-ai embedded-systems deep-learning-optimization performance-engineering device-benchmarking

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 15 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

gpu-mode/Triton-Puzzles

Puzzles for learning Triton

hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

open-mmlab/mmdeploy

OpenMMLab Model Deployment Framework

hyperai/tvm-cn

TVM Documentation in Chinese Simplified / TVM 中文文档

Explore ML Frameworks

All categories Trending ML Framework directory Insights