raketenkater/llm-server

Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs, optimizes MoE placement, crash recovery

/ 100

Emerging

This project simplifies running local large language models (LLMs) on your computer, especially with multiple GPUs. It automatically configures the LLM server for optimal performance based on your hardware, eliminating the need to manually adjust complex settings. Anyone who wants to run powerful AI models locally without becoming a command-line expert will find this tool useful.

Use this if you want to get the best possible speed and efficiency from local LLMs on your hardware, particularly with multiple graphics cards, without spending hours manually tweaking configurations.

Not ideal if you're comfortable with deep technical configurations and prefer fine-grained manual control over every server parameter yourself.

local-ai-deployment llm-inference gpu-optimization ai-performance machine-learning-operations

No Package No Dependents

Maintenance 13 / 25

Adoption 7 / 25

Maturity 11 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Shell

License

MIT

Higher-rated alternatives

containers/ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from...

av/harbor

One command brings a complete pre-wired LLM stack with hundreds of services to explore.

RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

foldl/chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

Explore LLM Tools

All categories Trending LLM Tool directory Insights