tensorchord/inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
This tool helps machine learning engineers and MLOps professionals understand and improve how quickly their AI models respond to requests when live. It takes your deployed machine learning models (like large language models, image generators, or embedding models) and simulates user traffic, providing detailed performance metrics like latency and throughput. The output helps you identify bottlenecks and optimize your model's serving infrastructure.
No commits in the last 6 months.
Use this if you need to objectively compare the speed and efficiency of different ways to deploy and serve your machine learning models.
Not ideal if you are looking to benchmark the training speed of your models or compare model accuracy.
Stars
28
Forks
3
Language
Python
License
—
Category
Last pushed
Jun 28, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tensorchord/inference-benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...