b4rtaz/distributed-llama
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
This project helps anyone who wants to run large language models (LLMs) on their own hardware, but finds them too slow. By connecting multiple home devices like PCs or Raspberry Pis into a single cluster, you can significantly speed up the process of generating text from LLMs. It takes an LLM and a prompt as input, and outputs the generated text much faster than a single device could.
2,856 stars.
Use this if you want to run powerful large language models locally and need to improve their processing speed by combining the power of several machines you already own.
Not ideal if you only have one device available or if you don't need to run very large language models that demand significant computational power.
Stars
2,856
Forks
215
Language
C++
License
MIT
Category
Last pushed
Feb 10, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/b4rtaz/distributed-llama"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
Blaizzy/mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...
armbues/SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...
microsoft/batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
armbues/SiLLM-examples
Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on...
kolinko/effort
An implementation of bucketMul LLM inference