Tiiny-AI/PowerInfer
High-speed Large Language Model Serving for Local Deployment
PowerInfer helps you run large AI language models directly on your personal computer using a single consumer-grade graphics card, making them faster and more accessible. It takes a model file and your input, then rapidly generates responses, allowing individuals or small businesses to use powerful AI locally without needing expensive server hardware. This is ideal for researchers, developers, or anyone needing to run LLMs privately and quickly on their own machine.
8,808 stars.
Use this if you need to run large language models on your personal computer with a standard GPU and want significantly faster response times.
Not ideal if you are looking for a cloud-based LLM solution or if you only have a CPU and do not require major performance boosts.
Stars
8,808
Forks
501
Language
C++
License
MIT
Category
Last pushed
Jan 24, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Tiiny-AI/PowerInfer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...