raketenkater/llm-server

Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs, optimizes MoE placement, crash recovery

31
/ 100
Emerging

This project simplifies running local large language models (LLMs) on your computer, especially with multiple GPUs. It automatically configures the LLM server for optimal performance based on your hardware, eliminating the need to manually adjust complex settings. Anyone who wants to run powerful AI models locally without becoming a command-line expert will find this tool useful.

Use this if you want to get the best possible speed and efficiency from local LLMs on your hardware, particularly with multiple graphics cards, without spending hours manually tweaking configurations.

Not ideal if you're comfortable with deep technical configurations and prefer fine-grained manual control over every server parameter yourself.

local-ai-deployment llm-inference gpu-optimization ai-performance machine-learning-operations
No Package No Dependents
Maintenance 13 / 25
Adoption 7 / 25
Maturity 11 / 25
Community 0 / 25

How are scores calculated?

Stars

30

Forks

Language

Shell

License

MIT

Last pushed

Mar 25, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/raketenkater/llm-server"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.