mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

71
/ 100
Verified

This project offers standardized benchmarks to measure how quickly various systems can run machine learning models across different deployment scenarios. It takes in various machine learning models (like ResNet, BERT, Llama2) and system configurations, providing performance metrics like inference speed. System architects, hardware engineers, and ML platform developers use this to compare and optimize the performance of their AI systems.

1,539 stars. Actively maintained with 25 commits in the last 30 days.

Use this if you need to objectively evaluate and compare the inference speed of different hardware and software configurations for your machine learning deployments.

Not ideal if you are looking for tools to train machine learning models or to optimize model accuracy rather than deployment performance.

AI system performance ML model deployment hardware benchmarking system optimization inference speed evaluation
No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

1,539

Forks

612

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

25

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mlcommons/inference"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.