raketenkater/llm-server
Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs, optimizes MoE placement, crash recovery
This project simplifies running local large language models (LLMs) on your computer, especially with multiple GPUs. It automatically configures the LLM server for optimal performance based on your hardware, eliminating the need to manually adjust complex settings. Anyone who wants to run powerful AI models locally without becoming a command-line expert will find this tool useful.
Use this if you want to get the best possible speed and efficiency from local LLMs on your hardware, particularly with multiple graphics cards, without spending hours manually tweaking configurations.
Not ideal if you're comfortable with deep technical configurations and prefer fine-grained manual control over every server parameter yourself.
Stars
30
Forks
—
Language
Shell
License
MIT
Category
Last pushed
Mar 25, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/raketenkater/llm-server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from...
av/harbor
One command brings a complete pre-wired LLM stack with hundreds of services to explore.
RunanywhereAI/runanywhere-sdks
Production ready toolkit to run AI locally
runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
foldl/chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)