ChiefGyk3D/FrankenLLM

Stitched-together GPUs, but it lives! Run different LLM models optimally across multiple NVIDIA GPUs

/ 100

Emerging

Maximize the use of your NVIDIA GPUs by running multiple large language models (LLMs) simultaneously, even if your GPUs have different memory capacities. This project allows you to input various LLM models and assign them to specific GPUs, providing optimized, independent model serving on each. It's designed for anyone managing dedicated LLM servers or multi-GPU home lab machines who needs to serve different models efficiently.

Use this if you have multiple NVIDIA GPUs and want to run different LLM models on each of them concurrently, ensuring maximum hardware utilization and zero interference between models.

Not ideal if you only have a single GPU or if your primary need is to run a single, extremely large model that spans across multiple GPUs.

LLM-deployment AI-inference GPU-resource-management local-AI-serving AI-model-hosting

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 13 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Shell

License

—

Higher-rated alternatives

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...

kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...

sophgo/tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.

Explore LLM Tools

All categories Trending LLM Tool directory Insights