ML Inference Benchmarking ML Frameworks

Standardized benchmarks and performance evaluation frameworks for ML model inference across devices and hardware (GPUs, CPUs, mobile, edge). Does NOT include training benchmarks, model architectures, or optimization techniques without benchmark implementations.

There are 69 ml inference benchmarking frameworks tracked. 2 score above 70 (verified tier). The highest-rated is NVIDIA/TransformerEngine at 73/100 with 3,206 stars. 3 of the top 10 are actively maintained.

Get all 69 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-inference-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including...

73
Verified
2 mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

71
Verified
3 mlcommons/training

Reference implementations of MLPerf® training benchmarks

64
Established
4 datamade/usaddress

:us: a python library for parsing unstructured United States address strings...

62
Established
5 GRAAL-Research/deepparse

Deepparse is a state-of-the-art library for parsing multinational street...

61
Established
6 mlcommons/storage

MLPerf® Storage Benchmark Suite

58
Established
7 CMU-SAFARI/Pythia

A customizable hardware prefetching framework using online reinforcement...

58
Established
8 deepspeedai/DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

57
Established
9 itlab-vision/dl-benchmark

Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow,...

57
Established
10 ise-uiuc/nnsmith

Automated DNN generation for fuzz testing and more

56
Established
11 Ki6an/fastT5

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

54
Established
12 TristanBilot/mlx-benchmark

Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) +...

53
Established
13 CMU-SAFARI/Hermes

A speculative mechanism to accelerate long-latency off-chip load requests by...

51
Established
14 mrdbourke/m1-machine-learning-test

Code for testing various M1 Chip benchmarks with TensorFlow.

51
Established
15 Tencent/PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and...

49
Emerging
16 Azure/MS-AMP

Microsoft Automatic Mixed Precision Library

48
Emerging
17 microsoft/hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.

47
Emerging
18 XiaoMi/mobile-ai-bench

Benchmarking Neural Network Inference on Mobile Devices

46
Emerging
19 mlcommons/inference_results_v5.1

This repository contains the results and code for the MLPerf® Inference v5.1...

45
Emerging
20 OpenBMB/BMInf

Efficient Inference for Big Models

45
Emerging
21 AI-performance/embedded-ai.bench

benchmark for embededded-ai deep learning inference engines, such as NCNN /...

43
Emerging
22 tlkh/tf-metal-experiments

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

42
Emerging
23 PEQUAN/hpc-mix-bench

Benchmarks for mixed-precision emulations

42
Emerging
24 mlcommons/inference_results_v5.0

This repository contains the results and code for the MLPerf® Inference v5.0...

41
Emerging
25 mlcommons/mlperf_client

MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on...

41
Emerging
26 mlalma/MLXUtilsLibrary

Utilities for easing the development of machine learning inference libraries...

41
Emerging
27 ise-uiuc/WhiteFox

WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024)

40
Emerging
28 mlcommons/training_results_v4.0

This repository contains the results and code for the MLPerf™ Training v4.0...

40
Emerging
29 hanxiao/umap-mlx

UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.

38
Emerging
30 ProbioticFarmer/mlx-deterministic

Batch-invariant operations for deterministic LLM inference on Apple Silicon using MLX

38
Emerging
31 bartbussmann/BatchTopK

Implementation of the BatchTopK activation function for training sparse...

37
Emerging
32 RAZZULLIX/fast_topk_batched

High-performance batched Top-K selection for CPU inference. Up to 80x faster...

36
Emerging
33 hanxiao/flash-kmeans-mlx

IO-aware batched K-Means for Apple Silicon, ported from Flash-KMeans...

36
Emerging
34 ayinedjimi/KVortex

VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache...

35
Emerging
35 CMU-SAFARI/Pythia-HDL

Implementation of Pythia: A Customizable Hardware Prefetching Framework...

35
Emerging
36 CMU-SAFARI/Athena

A reinforcement learning based policy to dynamically coordinate off-chip...

34
Emerging
37 killerbotofthenewworld/DDR5-AI-memory-tuner

🧠 The Ultimate AI-Powered DDR5 Memory Tuning Simulator

33
Emerging
38 lin-tan/DocTer

For our ISSTA22 paper "DocTer: Documentation-Guided Fuzzing for Testing Deep...

33
Emerging
39 hanxiao/mlx-vis

Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and...

33
Emerging
40 TristanBilot/mlx-GCN

MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2...

33
Emerging
41 ChharithOeun/torch-amd-setup

Auto-detect AMD GPU for PyTorch — ROCm, DirectML, CUDA, MPS, CPU. Fixes...

32
Emerging
42 ise-uiuc/DeepREL

Fuzzing Deep-Learning Libraries via Automated Relational API Inference...

30
Emerging
43 eembc/energyrunner

The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark.

30
Emerging
44 aallan/benchmarking-ml-on-the-edge

Benchmarking machine learning inferencing on embedded hardware.

29
Experimental
45 kqb/mlx-od-moe

On-Demand Mixture of Experts for Apple Silicon — run 375GB models in 192GB RAM

29
Experimental
46 cotesiito/flashtensors

🚀 Accelerate your AI projects with flashtensors, a fast inference engine...

27
Experimental
47 ise-uiuc/NablaFuzz

Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)

27
Experimental
48 hanxiao/pacmap-mlx

PaCMAP in pure MLX for Apple Silicon. Pure GPU, no scipy/numba.

27
Experimental
49 99roomz/lokly

Address parser for Indian Addresses - Demo at

27
Experimental
50 mctosima/mlx_playground

Run Image Classification on Apple Silicon (Mac)

26
Experimental
51 Rianbajukendari/mini-infer

🚀 Accelerate LLM inference with Mini-Infer, a high-performance engine...

24
Experimental
52 SYSU-Video/MFIBA

MFIBA: Multiscale Feature Importance-based Bit Allocation for End-to-End...

24
Experimental
53 hollance/metal-gpgpu

Collection of notes on how to use Apple’s Metal API for compute tasks

24
Experimental
54 instax-dutta/easy-mlx

easy-mlx — Local AI runtime for Apple Silicon powered by MLX.

23
Experimental
55 RobotFlow-Labs/container-toolkit-mlx

GPU-accelerated MLX inference for Linux containers on Apple Silicon. The...

23
Experimental
56 Kokotpica/surogate

🚀 Accelerate large language model training and fine-tuning with Surogate’s...

23
Experimental
57 makgunay/research-mlx-ui

Autonomous ML research on Apple Silicon — Karpathy's autoresearch with MLX +...

22
Experimental
58 chrispion/fast_topk_batched

🚀 Accelerate CPU inference with Fast TopK for high-performance batched Top-K...

22
Experimental
59 kossisoroyce/timber-benchmarks

Benchmarks for Timber AOT compiler: zero-RAM tree-based ML inference and...

22
Experimental
60 ChharithOeun/directml-benchmark

Reproducible GPU float32 benchmarks — AMD DirectML 40.2x speedup on RX 5700...

22
Experimental
61 milliaccount/SynapSwap

🔄 Transform your GPU's VRAM limits with SynapSwap, a predictive...

21
Experimental
62 ssmall256/mps-kernels-skill

Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests,...

19
Experimental
63 billyzs/bench

Demo for using Google Benchmark and Apple's MLX

19
Experimental
64 vladBaciu/MLino-Bench

MLino bench: A comprehensive benchmarking tool for evaluating ML models on...

19
Experimental
65 hogeheer499-commits/strix-halo-guide

57 t/s LLM inference on AMD Ryzen AI MAX+ 395 — the complete optimization...

16
Experimental
66 RobotFlow-Labs/LeRobot-mlx

LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple...

14
Experimental
67 DahsjsDio/mlx-vis

Accelerate high-speed dimensionality reduction on Apple Silicon with pure...

14
Experimental
68 cmontemuino/amd-mi300x-research-data

Research datasets and experimental results from comprehensive ML...

13
Experimental
69 dc-dc-dc/mlx-lite

A package for running tflite files in MLX.

12
Experimental