ML Inference Benchmarking ML Frameworks

Standardized benchmarks and performance evaluation frameworks for ML model inference across devices and hardware (GPUs, CPUs, mobile, edge). Does NOT include training benchmarks, model architectures, or optimization techniques without benchmark implementations.

There are 69 ml inference benchmarking frameworks tracked. 2 score above 70 (verified tier). The highest-rated is NVIDIA/TransformerEngine at 73/100 with 3,206 stars. 3 of the top 10 are actively maintained.

Get all 69 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-inference-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	NVIDIA/TransformerEngine A library for accelerating Transformer models on NVIDIA GPUs, including...	73	Verified	3,206	Python
2	mlcommons/inference Reference implementations of MLPerf® inference benchmarks	71	Verified	1,539	Python
3	mlcommons/training Reference implementations of MLPerf® training benchmarks	64	Established	1,748	Python
4	datamade/usaddress :us: a python library for parsing unstructured United States address strings...	62	Established	1,618	Python
5	GRAAL-Research/deepparse Deepparse is a state-of-the-art library for parsing multinational street...	61	Established	332	Python
6	mlcommons/storage MLPerf® Storage Benchmark Suite	58	Established	175	Python
7	CMU-SAFARI/Pythia A customizable hardware prefetching framework using online reinforcement...	58	Established	158	C++
8	deepspeedai/DeepSpeed-MII MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.	57	Established	2,099	Python
9	itlab-vision/dl-benchmark Deep Learning Inference benchmark. Supports OpenVINO™ toolkit, TensorFlow,...	57	Established	35	HTML
10	ise-uiuc/nnsmith Automated DNN generation for fuzz testing and more	56	Established	144	Python
11	Ki6an/fastT5 ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.	54	Established	589	Python
12	TristanBilot/mlx-benchmark Benchmark of Apple MLX operations on all Apple Silicon chips (GPU, CPU) +...	53	Established	217	Python
13	CMU-SAFARI/Hermes A speculative mechanism to accelerate long-latency off-chip load requests by...	51	Established	77	C++
14	mrdbourke/m1-machine-learning-test Code for testing various M1 Chip benchmarks with TensorFlow.	51	Established	536	Jupyter Notebook
15	Tencent/PocketFlow An Automatic Model Compression (AutoMC) framework for developing smaller and...	49	Emerging	2,914	Python
16	Azure/MS-AMP Microsoft Automatic Mixed Precision Library	48	Emerging	634	Python
17	microsoft/hummingbird Hummingbird compiles trained ML models into tensor computation for faster inference.	47	Emerging	3,530	Python
18	XiaoMi/mobile-ai-bench Benchmarking Neural Network Inference on Mobile Devices	46	Emerging	386	C++
19	mlcommons/inference_results_v5.1 This repository contains the results and code for the MLPerf® Inference v5.1...	45	Emerging	3	HTML
20	OpenBMB/BMInf Efficient Inference for Big Models	45	Emerging	587	Python
21	AI-performance/embedded-ai.bench benchmark for embededded-ai deep learning inference engines, such as NCNN /...	43	Emerging	202	Python
22	tlkh/tf-metal-experiments TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)	42	Emerging	280	Jupyter Notebook
23	PEQUAN/hpc-mix-bench Benchmarks for mixed-precision emulations	42	Emerging	1	C++
24	mlcommons/inference_results_v5.0 This repository contains the results and code for the MLPerf® Inference v5.0...	41	Emerging	12	HTML
25	mlcommons/mlperf_client MLPerf Client is a benchmark for Windows, Linux and macOS, focusing on...	41	Emerging	80	C++
26	mlalma/MLXUtilsLibrary Utilities for easing the development of machine learning inference libraries...	41	Emerging	2	Swift
27	ise-uiuc/WhiteFox WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models (OOPSLA 2024)	40	Emerging	80	Python
28	mlcommons/training_results_v4.0 This repository contains the results and code for the MLPerf™ Training v4.0...	40	Emerging	12	Python
29	hanxiao/umap-mlx UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.	38	Emerging	40	Python
30	ProbioticFarmer/mlx-deterministic Batch-invariant operations for deterministic LLM inference on Apple Silicon using MLX	38	Emerging	7	Python
31	bartbussmann/BatchTopK Implementation of the BatchTopK activation function for training sparse...	37	Emerging	61	Python
32	RAZZULLIX/fast_topk_batched High-performance batched Top-K selection for CPU inference. Up to 80x faster...	36	Emerging	16	C++
33	hanxiao/flash-kmeans-mlx IO-aware batched K-Means for Apple Silicon, ported from Flash-KMeans...	36	Emerging	11	Python
34	ayinedjimi/KVortex VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache...	35	Emerging	2	C++
35	CMU-SAFARI/Pythia-HDL Implementation of Pythia: A Customizable Hardware Prefetching Framework...	35	Emerging	17	Scala
36	CMU-SAFARI/Athena A reinforcement learning based policy to dynamically coordinate off-chip...	34	Emerging	8	C++
37	killerbotofthenewworld/DDR5-AI-memory-tuner 🧠 The Ultimate AI-Powered DDR5 Memory Tuning Simulator	33	Emerging	8	Python
38	lin-tan/DocTer For our ISSTA22 paper "DocTer: Documentation-Guided Fuzzing for Testing Deep...	33	Emerging	39	—
39	hanxiao/mlx-vis Pure MLX implementations of UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and...	33	Emerging	65	Python
40	TristanBilot/mlx-GCN MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2...	33	Emerging	25	Python
41	ChharithOeun/torch-amd-setup Auto-detect AMD GPU for PyTorch — ROCm, DirectML, CUDA, MPS, CPU. Fixes...	32	Emerging	1	Python
42	ise-uiuc/DeepREL Fuzzing Deep-Learning Libraries via Automated Relational API Inference...	30	Emerging	40	Python
43	eembc/energyrunner The EEMBC EnergyRunner application framework for the MLPerf Tiny benchmark.	30	Emerging	21	—
44	aallan/benchmarking-ml-on-the-edge Benchmarking machine learning inferencing on embedded hardware.	29	Experimental	26	Python
45	kqb/mlx-od-moe On-Demand Mixture of Experts for Apple Silicon — run 375GB models in 192GB RAM	29	Experimental	1	Python
46	cotesiito/flashtensors 🚀 Accelerate your AI projects with flashtensors, a fast inference engine...	27	Experimental	10	Python
47	ise-uiuc/NablaFuzz Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)	27	Experimental	27	Python
48	hanxiao/pacmap-mlx PaCMAP in pure MLX for Apple Silicon. Pure GPU, no scipy/numba.	27	Experimental	19	Python
49	99roomz/lokly Address parser for Indian Addresses - Demo at	27	Experimental	6	HTML
50	mctosima/mlx_playground Run Image Classification on Apple Silicon (Mac)	26	Experimental	8	Python
51	Rianbajukendari/mini-infer 🚀 Accelerate LLM inference with Mini-Infer, a high-performance engine...	24	Experimental	—	Python
52	SYSU-Video/MFIBA MFIBA: Multiscale Feature Importance-based Bit Allocation for End-to-End...	24	Experimental	3	Python
53	hollance/metal-gpgpu Collection of notes on how to use Apple’s Metal API for compute tasks	24	Experimental	107	—
54	instax-dutta/easy-mlx easy-mlx — Local AI runtime for Apple Silicon powered by MLX.	23	Experimental	1	Python
55	RobotFlow-Labs/container-toolkit-mlx GPU-accelerated MLX inference for Linux containers on Apple Silicon. The...	23	Experimental	1	Swift
56	Kokotpica/surogate 🚀 Accelerate large language model training and fine-tuning with Surogate’s...	23	Experimental	—	C++
57	makgunay/research-mlx-ui Autonomous ML research on Apple Silicon — Karpathy's autoresearch with MLX +...	22	Experimental	—	Python
58	chrispion/fast_topk_batched 🚀 Accelerate CPU inference with Fast TopK for high-performance batched Top-K...	22	Experimental	—	C++
59	kossisoroyce/timber-benchmarks Benchmarks for Timber AOT compiler: zero-RAM tree-based ML inference and...	22	Experimental	—	C
60	ChharithOeun/directml-benchmark Reproducible GPU float32 benchmarks — AMD DirectML 40.2x speedup on RX 5700...	22	Experimental	—	Python
61	milliaccount/SynapSwap 🔄 Transform your GPU's VRAM limits with SynapSwap, a predictive...	21	Experimental	—	C
62	ssmall256/mps-kernels-skill Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests,...	19	Experimental	—	Python
63	billyzs/bench Demo for using Google Benchmark and Apple's MLX	19	Experimental	3	CMake
64	vladBaciu/MLino-Bench MLino bench: A comprehensive benchmarking tool for evaluating ML models on...	19	Experimental	3	C++
65	hogeheer499-commits/strix-halo-guide 57 t/s LLM inference on AMD Ryzen AI MAX+ 395 — the complete optimization...	16	Experimental	3	—
66	RobotFlow-Labs/LeRobot-mlx LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple...	14	Experimental	—	Python
67	DahsjsDio/mlx-vis Accelerate high-speed dimensionality reduction on Apple Silicon with pure...	14	Experimental	—	Python
68	cmontemuino/amd-mi300x-research-data Research datasets and experimental results from comprehensive ML...	13	Experimental	—	—
69	dc-dc-dc/mlx-lite A package for running tflite files in MLX.	12	Experimental	7	C++

Comparisons in this category

inference and training (71 vs 64) inference and inference_results_v5.1 (71 vs 45)