mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
This project offers standardized benchmarks to measure how quickly various systems can run machine learning models across different deployment scenarios. It takes in various machine learning models (like ResNet, BERT, Llama2) and system configurations, providing performance metrics like inference speed. System architects, hardware engineers, and ML platform developers use this to compare and optimize the performance of their AI systems.
1,539 stars. Actively maintained with 25 commits in the last 30 days.
Use this if you need to objectively evaluate and compare the inference speed of different hardware and software configurations for your machine learning deployments.
Not ideal if you are looking for tools to train machine learning models or to optimize model accuracy rather than deployment performance.
Stars
1,539
Forks
612
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
25
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mlcommons/inference"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...
mlcommons/training
Reference implementations of MLPerf® training benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
CMU-SAFARI/Pythia
A customizable hardware prefetching framework using online reinforcement learning as described...