ChiefGyk3D/FrankenLLM
Stitched-together GPUs, but it lives! Run different LLM models optimally across multiple NVIDIA GPUs
Maximize the use of your NVIDIA GPUs by running multiple large language models (LLMs) simultaneously, even if your GPUs have different memory capacities. This project allows you to input various LLM models and assign them to specific GPUs, providing optimized, independent model serving on each. It's designed for anyone managing dedicated LLM servers or multi-GPU home lab machines who needs to serve different models efficiently.
Use this if you have multiple NVIDIA GPUs and want to run different LLM models on each of them concurrently, ensuring maximum hardware utilization and zero interference between models.
Not ideal if you only have a single GPU or if your primary need is to run a single, extremely large model that spans across multiple GPUs.
Stars
9
Forks
1
Language
Shell
License
—
Category
Last pushed
Feb 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ChiefGyk3D/FrankenLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
SemiAnalysisAI/InferenceX
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
uccl-project/uccl
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.