RahulSChand/gpu_poor

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

/ 100

Emerging

This tool helps you estimate if your GPU can handle a specific Large Language Model (LLM) and how fast it will process text. You input details about the LLM, your GPU, and desired settings, and it tells you the required GPU memory and the approximate tokens generated per second. It's designed for anyone working with LLMs who needs to understand hardware constraints for running or fine-tuning these models.

1,396 stars. No commits in the last 6 months.

Use this if you need to quickly determine the GPU memory and processing speed for a large language model before attempting to run or fine-tune it.

Not ideal if you need perfectly precise, real-time measurements, as the tool provides estimations rather than exact, live performance metrics.

LLM deployment GPU resource planning Model fine-tuning AI model capacity Generative AI hardware

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

1,396

Forks

Language

JavaScript

License

—

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights