jdaln/dgx-spark-inference-stack

Serve the home! Inference stack for your Nvidia DGX Spark aka the Grace Blackwell AI supercomputer on your desk. Mostly vLLM based for now

/ 100

Emerging

This project helps owners of an Nvidia DGX Spark AI supercomputer to run and serve large language models (LLMs) efficiently on their device. It takes various LLM model files as input and provides an API endpoint to generate text completions or handle chat interactions. This is ideal for AI researchers, enthusiasts, or small businesses with a DGX Spark who want to utilize its powerful capabilities for local AI inference.

Use this if you own an Nvidia DGX Spark and want to easily deploy and manage large language models for local inference, getting the most out of your hardware.

Not ideal if you do not have an Nvidia DGX Spark or are looking for a cloud-based LLM serving solution.

AI-inference large-language-models home-lab AI-supercomputer-management local-AI-deployment

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 11 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

Apache-2.0

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights