jdaln/dgx-spark-inference-stack
Serve the home! Inference stack for your Nvidia DGX Spark aka the Grace Blackwell AI supercomputer on your desk. Mostly vLLM based for now
This project helps owners of an Nvidia DGX Spark AI supercomputer to run and serve large language models (LLMs) efficiently on their device. It takes various LLM model files as input and provides an API endpoint to generate text completions or handle chat interactions. This is ideal for AI researchers, enthusiasts, or small businesses with a DGX Spark who want to utilize its powerful capabilities for local AI inference.
Use this if you own an Nvidia DGX Spark and want to easily deploy and manage large language models for local inference, getting the most out of your hardware.
Not ideal if you do not have an Nvidia DGX Spark or are looking for a cloud-based LLM serving solution.
Stars
26
Forks
3
Language
JavaScript
License
Apache-2.0
Category
Last pushed
Feb 24, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jdaln/dgx-spark-inference-stack"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...