alibaba/InferSim

A Lightweight LLM Inference Performance Simulator

52
/ 100
Established

This tool helps AI infrastructure engineers and model developers predict the performance of large language models (LLMs) on different GPU setups. You input your LLM's architecture details and GPU specifications, and it outputs key performance metrics like time-to-first-token (TTFT), time-per-output-token (TPOT), and overall throughput (tokens/GPU/second). This is for anyone involved in deploying and optimizing LLM inference systems.

Use this if you need to understand how changes in LLM model design or GPU hardware will impact inference speed and efficiency, especially for multi-GPU or multi-node deployments.

Not ideal if you are a data scientist primarily focused on model training or fine-tuning, and not directly involved in the system-level deployment and performance optimization of LLM inference.

LLM deployment AI infrastructure GPU optimization model performance analysis system co-design
No Package No Dependents
Maintenance 10 / 25
Adoption 8 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

65

Forks

18

Language

Python

License

Apache-2.0

Last pushed

Mar 02, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/alibaba/InferSim"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.