erans/selfhostllm

A web-based calculator for estimating GPU memory requirements and maximum concurrent requests for self-hosted LLM inference.

42
/ 100
Emerging

This tool helps you understand how many simultaneous requests your GPU setup can handle when running large language models (LLMs) on your own hardware. You input your GPU's memory, the LLM you want to use, and any quantization settings, and it estimates the maximum number of concurrent users or tasks your system can support. This is designed for IT professionals, ML engineers, or researchers who are deploying LLMs locally and need to plan their hardware resources.

Use this if you need to estimate the GPU memory required and the maximum concurrent users for self-hosting a large language model, ensuring efficient resource allocation.

Not ideal if you are using cloud-based LLM APIs or do not manage your own GPU infrastructure for inference.

LLM-deployment GPU-resource-planning on-premise-AI ML-infrastructure inference-scaling
No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 15 / 25
Community 10 / 25

How are scores calculated?

Stars

37

Forks

4

Language

HTML

License

MIT

Last pushed

Feb 25, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/erans/selfhostllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.