NexaAI/nexa-sdk
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.
This is a tool for developers who want to run advanced AI models directly on user devices like phones, PCs, or IoT devices, not just in the cloud. It takes your chosen large language or vision model and optimizes it to run efficiently on various hardware, delivering fast, low-energy AI capabilities in your applications. This is designed for software developers building applications that need on-device AI.
7,797 stars. Actively maintained with 1 commit in the last 30 days.
Use this if you are a developer creating applications that need to run cutting-edge AI models directly on user hardware like smartphones or embedded systems, rather than relying on cloud-based AI services.
Not ideal if you are a non-developer seeking an out-of-the-box AI application or if your primary need is cloud-based AI model inference.
Stars
7,797
Forks
956
Language
Kotlin
License
Apache-2.0
Category
Last pushed
Feb 26, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NexaAI/nexa-sdk"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...