lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
This project helps AI developers and researchers run powerful Large Language Models (LLMs) on hardware with limited GPU memory. It takes a large LLM like Llama3.1 405B and allows it to generate text on a single 8GB GPU. This means you can deploy sophisticated AI capabilities without needing expensive, high-end graphics cards, making advanced LLMs more accessible.
13,828 stars. Used by 2 other packages. Available on PyPI.
Use this if you need to run large language models for text generation or other inference tasks on devices with constrained GPU memory, like a 4GB or 8GB GPU.
Not ideal if you already have access to high-end GPUs with ample memory or if you are focused on training LLMs rather than just running them.
Stars
13,828
Forks
1,368
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Dependencies
8
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lyogavin/airllm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
shibing624/MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline....
GradientHQ/parallax
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere
CrazyBoyM/llama3-Chinese-chat
Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。
CLUEbenchmark/CLUE
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained...
MediaBrain-SJTU/MING
明医 (MING):中文医疗问诊大模型