SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3

69
/ 100
Established

This project provides continuous, real-time benchmarks for the performance of large language model (LLM) inference. It takes as input various open-source inference frameworks and hardware configurations, outputting up-to-date metrics on token throughput and efficiency. LLM operators, machine learning engineers, and researchers who manage or deploy large-scale AI models would use this.

655 stars. Actively maintained with 64 commits in the last 30 days.

Use this if you need to continuously track and compare the real-world performance of different LLM inference software stacks and hardware combinations.

Not ideal if you are looking for benchmarks of small-scale AI models or for general-purpose computing tasks outside of LLM inference.

LLM-operations AI-infrastructure ML-benchmarking datacenter-optimization deep-learning-deployment
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 22 / 25

How are scores calculated?

Stars

655

Forks

99

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

64

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/SemiAnalysisAI/InferenceX"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.