harleyszhang/llm_counts

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

/ 100

Emerging

This tool helps AI engineers and researchers understand the theoretical performance limits of large language models (LLMs) on different GPUs. You input an LLM name, GPU type, and various parallelism and batch size settings, and it outputs detailed analysis of parameters, computational load (FLOPS), memory usage, and latency during both prefill and decode stages. It's designed for those optimizing LLM deployments.

115 stars. No commits in the last 6 months.

Use this if you need to predict an LLM's performance and resource consumption on various hardware and parallelism configurations before actual deployment, helping you choose the most efficient setup.

Not ideal if you are looking for actual real-world inference benchmarks from a deployed system, as this provides theoretical analysis rather than empirical measurements.

LLM deployment GPU optimization AI infrastructure model profiling performance prediction

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

115

Forks

Language

Python

License

—

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights