EvilFreelancer/docker-llama.cpp-rpc

Данный проект основан на llama.cpp и компилирует только RPC-сервер, а так же вспомогательные утилиты, работающие в режиме RPC-клиента, необходимые для реализации распределённого инференса конвертированных в GGUF формат Больших Языковых Моделей (БЯМ) и Эмбеддинговых Моделей.

/ 100

Emerging

This project helps you run large language models and embedding models on your own servers without needing powerful hardware on a single machine. You provide your GGUF-formatted models, and it gives you a distributed system that can serve text completions or embeddings via a simple API. This is ideal for developers or system administrators integrating AI capabilities into their applications.

No commits in the last 6 months.

Use this if you need to serve large language models or embedding models efficiently across multiple CPU and GPU servers, making the most of your existing hardware.

Not ideal if you're looking for a user-friendly application to interact with LLMs directly, as this tool is focused on backend infrastructure.

AI deployment MLOps distributed inference language model serving backend development

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Shell

License

MIT

Higher-rated alternatives

containers/ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from...

av/harbor

One command brings a complete pre-wired LLM stack with hundreds of services to explore.

RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

foldl/chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

Explore LLM Tools

All categories Trending LLM Tool directory Insights