argonne-lcf/LLM-Inference-Bench

LLM-Inference-Bench

/ 100

Emerging

This tool helps AI researchers and system architects understand how different large language models perform on various AI hardware accelerators. You input information about the LLM (like LLaMA or Mistral) and the hardware platform you're considering (like Nvidia GPUs, AMD GPUs, or Intel Habana), and it provides detailed performance metrics to help you select the most efficient configuration. It's designed for those who need to optimize the computational demands of deploying LLMs for text generation applications.

No commits in the last 6 months.

Use this if you need to determine the optimal combination of a large language model, inference framework, and hardware accelerator to achieve the best performance and scalability for your AI applications.

Not ideal if you are an end-user of an LLM application and don't manage the underlying hardware or software infrastructure.

AI-infrastructure LLM-deployment hardware-evaluation performance-optimization AI-research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

BSD-3-Clause

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights