tensorchord/inference-benchmark

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

/ 100

Experimental

This tool helps machine learning engineers and MLOps professionals understand and improve how quickly their AI models respond to requests when live. It takes your deployed machine learning models (like large language models, image generators, or embedding models) and simulates user traffic, providing detailed performance metrics like latency and throughput. The output helps you identify bottlenecks and optimize your model's serving infrastructure.

No commits in the last 6 months.

Use this if you need to objectively compare the speed and efficiency of different ways to deploy and serve your machine learning models.

Not ideal if you are looking to benchmark the training speed of your models or compare model accuracy.

machine-learning-operations model-deployment performance-testing AI-infrastructure large-language-models

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights