tensorchord/inference-benchmark

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

24
/ 100
Experimental

This tool helps machine learning engineers and MLOps professionals understand and improve how quickly their AI models respond to requests when live. It takes your deployed machine learning models (like large language models, image generators, or embedding models) and simulates user traffic, providing detailed performance metrics like latency and throughput. The output helps you identify bottlenecks and optimize your model's serving infrastructure.

No commits in the last 6 months.

Use this if you need to objectively compare the speed and efficiency of different ways to deploy and serve your machine learning models.

Not ideal if you are looking to benchmark the training speed of your models or compare model accuracy.

machine-learning-operations model-deployment performance-testing AI-infrastructure large-language-models
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

28

Forks

3

Language

Python

License

Last pushed

Jun 28, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tensorchord/inference-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.