b4rtaz/distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

/ 100

Established

This project helps anyone who wants to run large language models (LLMs) on their own hardware, but finds them too slow. By connecting multiple home devices like PCs or Raspberry Pis into a single cluster, you can significantly speed up the process of generating text from LLMs. It takes an LLM and a prompt as input, and outputs the generated text much faster than a single device could.

2,856 stars.

Use this if you want to run powerful large language models locally and need to improve their processing speed by combining the power of several machines you already own.

Not ideal if you only have one device available or if you don't need to run very large language models that demand significant computational power.

local LLM inference home AI setup distributed computing text generation speed-up personal AI projects

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

2,856

Forks

215

Language

C++

License

MIT

Related models

Blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...

armbues/SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...

microsoft/batch-inference

Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.

armbues/SiLLM-examples

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on...

kolinko/effort

An implementation of bucketMul LLM inference

Explore Transformer Models

All categories Trending Transformer directory Insights