Tiiny-AI/PowerInfer

High-speed Large Language Model Serving for Local Deployment

/ 100

Established

PowerInfer helps you run large AI language models directly on your personal computer using a single consumer-grade graphics card, making them faster and more accessible. It takes a model file and your input, then rapidly generates responses, allowing individuals or small businesses to use powerful AI locally without needing expensive server hardware. This is ideal for researchers, developers, or anyone needing to run LLMs privately and quickly on their own machine.

8,808 stars.

Use this if you need to run large language models on your personal computer with a standard GPU and want significantly faster response times.

Not ideal if you are looking for a cloud-based LLM solution or if you only have a CPU and do not require major performance boosts.

AI-on-device local-LLM-deployment personal-AI consumer-AI edge-AI

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

8,808

Forks

501

Language

C++

License

MIT

Compare

PowerInfer and vllm PowerInfer and inference PowerInfer and rtp-llm PowerInfer and LightLLM

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights