cli99/llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

46
/ 100
Emerging

This tool helps machine learning engineers and researchers estimate the latency and memory usage of Large Language Models (LLMs) like Transformers. By inputting details about your chosen model, GPU hardware, data type, and parallel processing setup, you receive predictions for how long training or inference will take and how much memory it will consume. It's designed for anyone planning or optimizing LLM deployments to understand performance before running costly experiments.

479 stars. No commits in the last 6 months.

Use this if you need to quickly assess different LLM configurations (model, GPU, parallelism) to predict training or inference time and memory requirements without running actual hardware tests.

Not ideal if you need precise, real-world measurements of your specific LLM implementation on actual hardware, as this provides theoretical estimates rather than empirical benchmarks.

LLM deployment model training AI infrastructure GPU optimization performance estimation
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

479

Forks

56

Language

Python

License

Apache-2.0

Last pushed

Apr 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/cli99/llm-analysis"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.