ai-decentralized/BloomBee

Decentralized LLMs fine-tuning and inference with offloading

/ 100

Established

This project helps AI developers and researchers overcome the challenge of running large language models (LLMs) when they lack a single, powerful GPU. It takes large LLMs like LLaMA and splits them across multiple machines, allowing smaller GPUs to contribute. The output is a functional, distributed LLM for inference or fine-tuning, accessible to anyone building AI applications or performing model research.

111 stars.

Use this if you need to run or fine-tune very large language models but only have access to several machines with modest GPU resources, not one extremely powerful GPU.

Not ideal if you already have access to a single, high-end GPU or a dedicated cluster powerful enough to run your LLMs efficiently without decentralization.

distributed-AI LLM-deployment AI-infrastructure model-fine-tuning resource-optimization

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

111

Forks

Language

Python

License

Apache-2.0

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights