lyogavin/airllm

AirLLM 70B inference with single 4GB GPU

/ 100

Established

This project helps AI developers and researchers run powerful Large Language Models (LLMs) on hardware with limited GPU memory. It takes a large LLM like Llama3.1 405B and allows it to generate text on a single 8GB GPU. This means you can deploy sophisticated AI capabilities without needing expensive, high-end graphics cards, making advanced LLMs more accessible.

13,828 stars. Used by 2 other packages. Available on PyPI.

Use this if you need to run large language models for text generation or other inference tasks on devices with constrained GPU memory, like a 4GB or 8GB GPU.

Not ideal if you already have access to high-end GPUs with ample memory or if you are focused on training LLMs rather than just running them.

AI model deployment LLM inference edge AI resource optimization text generation

Maintenance 10 / 25

Adoption 12 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

13,828

Forks

1,368

Language

Jupyter Notebook

License

Apache-2.0

Compare

airllm and Chinese-LLaMA-Alpaca

Related models

shibing624/MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline....

GradientHQ/parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

CrazyBoyM/llama3-Chinese-chat

Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。

CLUEbenchmark/CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained...

MediaBrain-SJTU/MING

明医 (MING)：中文医疗问诊大模型

Explore Transformer Models

All categories Trending Transformer directory Insights