justADeni/intel-npu-llm
A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)
This project helps developers run large language models (LLMs) directly on Intel's Neural Processing Units (NPUs) for local inference. It takes a pre-trained LLM and processes it to run efficiently on NPU-equipped Intel processors, providing a ready-to-use local model for AI-powered applications. It's intended for developers building applications that need to integrate local LLM capabilities, especially on devices with Intel Core Ultra processors.
Use this if you are a developer looking to deploy large language models on Intel NPU-equipped devices for faster and more power-efficient local inference in your applications.
Not ideal if you don't have an Intel processor with an NPU or if you are not a developer and simply want to use an off-the-shelf AI chat application.
Stars
35
Forks
3
Language
Python
License
MIT
Category
Last pushed
Oct 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/justADeni/intel-npu-llm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips