lyogavin/airllm

AirLLM 70B inference with single 4GB GPU

67
/ 100
Established

This project helps AI developers and researchers run powerful Large Language Models (LLMs) on hardware with limited GPU memory. It takes a large LLM like Llama3.1 405B and allows it to generate text on a single 8GB GPU. This means you can deploy sophisticated AI capabilities without needing expensive, high-end graphics cards, making advanced LLMs more accessible.

13,828 stars. Used by 2 other packages. Available on PyPI.

Use this if you need to run large language models for text generation or other inference tasks on devices with constrained GPU memory, like a 4GB or 8GB GPU.

Not ideal if you already have access to high-end GPUs with ample memory or if you are focused on training LLMs rather than just running them.

AI model deployment LLM inference edge AI resource optimization text generation
Maintenance 10 / 25
Adoption 12 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

13,828

Forks

1,368

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Mar 10, 2026

Commits (30d)

0

Dependencies

8

Reverse dependents

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lyogavin/airllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.