Tiiny-AI/PowerInfer

High-speed Large Language Model Serving for Local Deployment

54
/ 100
Established

PowerInfer helps you run large AI language models directly on your personal computer using a single consumer-grade graphics card, making them faster and more accessible. It takes a model file and your input, then rapidly generates responses, allowing individuals or small businesses to use powerful AI locally without needing expensive server hardware. This is ideal for researchers, developers, or anyone needing to run LLMs privately and quickly on their own machine.

8,808 stars.

Use this if you need to run large language models on your personal computer with a standard GPU and want significantly faster response times.

Not ideal if you are looking for a cloud-based LLM solution or if you only have a CPU and do not require major performance boosts.

AI-on-device local-LLM-deployment personal-AI consumer-AI edge-AI
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

8,808

Forks

501

Language

C++

License

MIT

Last pushed

Jan 24, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Tiiny-AI/PowerInfer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.