ML Inference Benchmarking ML Frameworks
Standardized benchmarks and performance evaluation frameworks for ML model inference across devices and hardware (GPUs, CPUs, mobile, edge). Does NOT include training benchmarks, model architectures, or optimization techniques without benchmark implementations.
There are 69 ml inference benchmarking frameworks tracked. 2 score above 70 (verified tier). The highest-rated is NVIDIA/TransformerEngine at 73/100 with 3,206 stars. 3 of the top 10 are actively maintained.
Get all 69 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-inference-benchmarking&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including... |
|
Verified |
| 2 |
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks |
|
Verified |
| 3 |
mlcommons/training
Reference implementations of MLPerf® training benchmarks |
|
Established |
| 4 |
datamade/usaddress
:us: a python library for parsing unstructured United States address strings... |
|
Established |
| 5 |
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street... |
|
Established |
| 6 |
mlcommons/storage
MLPerf® Storage Benchmark Suite |
|
Established |
| 7 |
CMU-SAFARI/Pythia
A customizable hardware prefetching framework using online reinforcement... |
|
Established |
| 8 |
deepspeedai/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. |
|
Established |
| 9 |
itlab-vision/dl-benchmark
Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow,... |
|
Established |
| 10 |
ise-uiuc/nnsmith
Automated DNN generation for fuzz testing and more |
|
Established |
| 11 |
Ki6an/fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x. |
|
Established |
| 12 |
TristanBilot/mlx-benchmark
Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) +... |
|
Established |
| 13 |
CMU-SAFARI/Hermes
A speculative mechanism to accelerate long-latency off-chip load requests by... |
|
Established |
| 14 |
mrdbourke/m1-machine-learning-test
Code for testing various M1 Chip benchmarks with TensorFlow. |
|
Established |
| 15 |
Tencent/PocketFlow
An Automatic Model Compression (AutoMC) framework for developing smaller and... |
|
Emerging |
| 16 |
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library |
|
Emerging |
| 17 |
microsoft/hummingbird
Hummingbird compiles trained ML models into tensor computation for faster inference. |
|
Emerging |
| 18 |
XiaoMi/mobile-ai-bench
Benchmarking Neural Network Inference on Mobile Devices |
|
Emerging |
| 19 |
mlcommons/inference_results_v5.1
This repository contains the results and code for the MLPerf® Inference v5.1... |
|
Emerging |
| 20 |
OpenBMB/BMInf
Efficient Inference for Big Models |
|
Emerging |
| 21 |
AI-performance/embedded-ai.bench
benchmark for embededded-ai deep learning inference engines, such as NCNN /... |
|
Emerging |
| 22 |
tlkh/tf-metal-experiments
TensorFlow Metal Backend on Apple Silicon Experiments (just for fun) |
|
Emerging |
| 23 |
PEQUAN/hpc-mix-bench
Benchmarks for mixed-precision emulations |
|
Emerging |
| 24 |
mlcommons/inference_results_v5.0
This repository contains the results and code for the MLPerf® Inference v5.0... |
|
Emerging |
| 25 |
mlcommons/mlperf_client
MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on... |
|
Emerging |
| 26 |
mlalma/MLXUtilsLibrary
Utilities for easing the development of machine learning inference libraries... |
|
Emerging |
| 27 |
ise-uiuc/WhiteFox
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024) |
|
Emerging |
| 28 |
mlcommons/training_results_v4.0
This repository contains the results and code for the MLPerf™ Training v4.0... |
|
Emerging |
| 29 |
hanxiao/umap-mlx
UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn. |
|
Emerging |
| 30 |
ProbioticFarmer/mlx-deterministic
Batch-invariant operations for deterministic LLM inference on Apple Silicon using MLX |
|
Emerging |
| 31 |
bartbussmann/BatchTopK
Implementation of the BatchTopK activation function for training sparse... |
|
Emerging |
| 32 |
RAZZULLIX/fast_topk_batched
High-performance batched Top-K selection for CPU inference. Up to 80x faster... |
|
Emerging |
| 33 |
hanxiao/flash-kmeans-mlx
IO-aware batched K-Means for Apple Silicon, ported from Flash-KMeans... |
|
Emerging |
| 34 |
ayinedjimi/KVortex
VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache... |
|
Emerging |
| 35 |
CMU-SAFARI/Pythia-HDL
Implementation of Pythia: A Customizable Hardware Prefetching Framework... |
|
Emerging |
| 36 |
CMU-SAFARI/Athena
A reinforcement learning based policy to dynamically coordinate off-chip... |
|
Emerging |
| 37 |
killerbotofthenewworld/DDR5-AI-memory-tuner
🧠 The Ultimate AI-Powered DDR5 Memory Tuning Simulator |
|
Emerging |
| 38 |
lin-tan/DocTer
For our ISSTA22 paper "DocTer: Documentation-Guided Fuzzing for Testing Deep... |
|
Emerging |
| 39 |
hanxiao/mlx-vis
Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and... |
|
Emerging |
| 40 |
TristanBilot/mlx-GCN
MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2... |
|
Emerging |
| 41 |
ChharithOeun/torch-amd-setup
Auto-detect AMD GPU for PyTorch — ROCm, DirectML, CUDA, MPS, CPU. Fixes... |
|
Emerging |
| 42 |
ise-uiuc/DeepREL
Fuzzing Deep-Learning Libraries via Automated Relational API Inference... |
|
Emerging |
| 43 |
eembc/energyrunner
The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark. |
|
Emerging |
| 44 |
aallan/benchmarking-ml-on-the-edge
Benchmarking machine learning inferencing on embedded hardware. |
|
Experimental |
| 45 |
kqb/mlx-od-moe
On-Demand Mixture of Experts for Apple Silicon — run 375GB models in 192GB RAM |
|
Experimental |
| 46 |
cotesiito/flashtensors
🚀 Accelerate your AI projects with flashtensors, a fast inference engine... |
|
Experimental |
| 47 |
ise-uiuc/NablaFuzz
Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23) |
|
Experimental |
| 48 |
hanxiao/pacmap-mlx
PaCMAP in pure MLX for Apple Silicon. Pure GPU, no scipy/numba. |
|
Experimental |
| 49 |
99roomz/lokly
Address parser for Indian Addresses - Demo at |
|
Experimental |
| 50 |
mctosima/mlx_playground
Run Image Classification on Apple Silicon (Mac) |
|
Experimental |
| 51 |
Rianbajukendari/mini-infer
🚀 Accelerate LLM inference with Mini-Infer, a high-performance engine... |
|
Experimental |
| 52 |
SYSU-Video/MFIBA
MFIBA: Multiscale Feature Importance-based Bit Allocation for End-to-End... |
|
Experimental |
| 53 |
hollance/metal-gpgpu
Collection of notes on how to use Apple’s Metal API for compute tasks |
|
Experimental |
| 54 |
instax-dutta/easy-mlx
easy-mlx — Local AI runtime for Apple Silicon powered by MLX. |
|
Experimental |
| 55 |
RobotFlow-Labs/container-toolkit-mlx
GPU-accelerated MLX inference for Linux containers on Apple Silicon. The... |
|
Experimental |
| 56 |
Kokotpica/surogate
🚀 Accelerate large language model training and fine-tuning with Surogate’s... |
|
Experimental |
| 57 |
makgunay/research-mlx-ui
Autonomous ML research on Apple Silicon — Karpathy's autoresearch with MLX +... |
|
Experimental |
| 58 |
chrispion/fast_topk_batched
🚀 Accelerate CPU inference with Fast TopK for high-performance batched Top-K... |
|
Experimental |
| 59 |
kossisoroyce/timber-benchmarks
Benchmarks for Timber AOT compiler: zero-RAM tree-based ML inference and... |
|
Experimental |
| 60 |
ChharithOeun/directml-benchmark
Reproducible GPU float32 benchmarks — AMD DirectML 40.2x speedup on RX 5700... |
|
Experimental |
| 61 |
milliaccount/SynapSwap
🔄 Transform your GPU's VRAM limits with SynapSwap, a predictive... |
|
Experimental |
| 62 |
ssmall256/mps-kernels-skill
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests,... |
|
Experimental |
| 63 |
billyzs/bench
Demo for using Google Benchmark and Apple's MLX |
|
Experimental |
| 64 |
vladBaciu/MLino-Bench
MLino bench: A comprehensive benchmarking tool for evaluating ML models on... |
|
Experimental |
| 65 |
hogeheer499-commits/strix-halo-guide
57 t/s LLM inference on AMD Ryzen AI MAX+ 395 — the complete optimization... |
|
Experimental |
| 66 |
RobotFlow-Labs/LeRobot-mlx
LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple... |
|
Experimental |
| 67 |
DahsjsDio/mlx-vis
Accelerate high-speed dimensionality reduction on Apple Silicon with pure... |
|
Experimental |
| 68 |
cmontemuino/amd-mi300x-research-data
Research datasets and experimental results from comprehensive ML... |
|
Experimental |
| 69 |
dc-dc-dc/mlx-lite
A package for running tflite files in MLX. |
|
Experimental |