NexaAI/nexa-sdk

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.

/ 100

Established

This is a tool for developers who want to run advanced AI models directly on user devices like phones, PCs, or IoT devices, not just in the cloud. It takes your chosen large language or vision model and optimizes it to run efficiently on various hardware, delivering fast, low-energy AI capabilities in your applications. This is designed for software developers building applications that need on-device AI.

7,797 stars. Actively maintained with 1 commit in the last 30 days.

Use this if you are a developer creating applications that need to run cutting-edge AI models directly on user hardware like smartphones or embedded systems, rather than relying on cloud-based AI services.

Not ideal if you are a non-developer seeking an out-of-the-box AI application or if your primary need is cloud-based AI model inference.

on-device AI mobile development edge computing AI application development embedded systems

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

7,797

Forks

956

Language

Kotlin

License

Apache-2.0

Recent Releases

v0.2.73 20 Feb 2026 v0.2.72 11 Feb 2026 v0.2.71 22 Jan 2026 v0.2.70 15 Jan 2026 v0.2.69 15 Jan 2026

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights