b4rtaz/distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

55
/ 100
Established

This project helps anyone who wants to run large language models (LLMs) on their own hardware, but finds them too slow. By connecting multiple home devices like PCs or Raspberry Pis into a single cluster, you can significantly speed up the process of generating text from LLMs. It takes an LLM and a prompt as input, and outputs the generated text much faster than a single device could.

2,856 stars.

Use this if you want to run powerful large language models locally and need to improve their processing speed by combining the power of several machines you already own.

Not ideal if you only have one device available or if you don't need to run very large language models that demand significant computational power.

local LLM inference home AI setup distributed computing text generation speed-up personal AI projects
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

2,856

Forks

215

Language

C++

License

MIT

Last pushed

Feb 10, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/b4rtaz/distributed-llama"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.