vitoplantamura/OnnxStream
Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but also Mistral 7B on desktops and servers. ARM, x86, WASM, RISC-V supported. Accelerated by XNNPACK. Python, C# and JS(WASM) bindings available.
This project helps hobbyists and specialized professionals run complex AI models like Stable Diffusion for image generation, large language models (LLMs) for text, or YOLO for object detection on resource-constrained devices like a Raspberry Pi or in web browsers. It takes trained AI models and processes them using very little memory, producing images, text, or detected objects, even on hardware with limited RAM. It's for users who need to deploy advanced AI capabilities efficiently on small, low-power machines or directly within web applications.
2,031 stars.
Use this if you need to run powerful AI models on hardware with very limited memory, such as single-board computers or directly in a web browser, without sacrificing the quality of your AI's output.
Not ideal if you are primarily focused on maximizing inference speed or throughput on high-end hardware, where memory consumption is not a primary concern.
Stars
2,031
Forks
89
Language
C++
License
—
Category
Last pushed
Jan 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vitoplantamura/OnnxStream"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...