EvilFreelancer/docker-llama.cpp-rpc
Данный проект основан на llama.cpp и компилирует только RPC-сервер, а так же вспомогательные утилиты, работающие в режиме RPC-клиента, необходимые для реализации распределённого инференса конвертированных в GGUF формат Больших Языковых Моделей (БЯМ) и Эмбеддинговых Моделей.
This project helps you run large language models and embedding models on your own servers without needing powerful hardware on a single machine. You provide your GGUF-formatted models, and it gives you a distributed system that can serve text completions or embeddings via a simple API. This is ideal for developers or system administrators integrating AI capabilities into their applications.
No commits in the last 6 months.
Use this if you need to serve large language models or embedding models efficiently across multiple CPU and GPU servers, making the most of your existing hardware.
Not ideal if you're looking for a user-friendly application to interact with LLMs directly, as this tool is focused on backend infrastructure.
Stars
23
Forks
5
Language
Shell
License
MIT
Category
Last pushed
May 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/EvilFreelancer/docker-llama.cpp-rpc"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from...
av/harbor
One command brings a complete pre-wired LLM stack with hundreds of services to explore.
RunanywhereAI/runanywhere-sdks
Production ready toolkit to run AI locally
runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
foldl/chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)