RahulSChand/gpu_poor
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
This tool helps you estimate if your GPU can handle a specific Large Language Model (LLM) and how fast it will process text. You input details about the LLM, your GPU, and desired settings, and it tells you the required GPU memory and the approximate tokens generated per second. It's designed for anyone working with LLMs who needs to understand hardware constraints for running or fine-tuning these models.
1,396 stars. No commits in the last 6 months.
Use this if you need to quickly determine the GPU memory and processing speed for a large language model before attempting to run or fine-tune it.
Not ideal if you need perfectly precise, real-time measurements, as the tool provides estimations rather than exact, live performance metrics.
Stars
1,396
Forks
87
Language
JavaScript
License
—
Category
Last pushed
Dec 03, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/RahulSChand/gpu_poor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...