xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

/ 100

Verified

This tool helps AI developers and researchers deploy and manage various artificial intelligence models, including large language models (LLMs), speech recognition, and multimodal models. It takes trained AI models and makes them accessible through a unified API, allowing other applications to easily interact with them. Anyone building AI-powered applications, from chatbots to image analysis tools, would use this to put their models into production.

9,129 stars. Actively maintained with 63 commits in the last 30 days. Available on PyPI.

Use this if you need to serve a variety of AI models (language, speech, multimodal) in a production environment, requiring flexible deployment options and a unified API.

Not ideal if you are a non-technical user simply looking to interact with existing AI models without needing to deploy or manage them.

AI-application-development model-serving LLM-deployment speech-recognition-systems multimodal-AI

Maintenance 22 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

9,129

Forks

805

Language

Python

License

Apache-2.0

Community Discussion

Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon 196 points · 75 comments · Mar 2026 Two different tricks for fast LLM inference 194 points · 67 comments · Feb 2026 Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration 114 points · 30 comments · Feb 2026 Launch HN: Tamarind Bio (YC W24) – AI Inference Provider for Drug Discovery 85 points · 20 comments · Jan 2026 Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference 60 points · 24 comments · Mar 2026

Recent Releases

v2.4.0 29 Mar 2026 v2.3.0 13 Mar 2026 v2.2.0 28 Feb 2026 v2.1.0 14 Feb 2026 v2.0.0 31 Jan 2026

Compare

inference and vllm inference and xllm inference and PowerInfer

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

tenstorrent/tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Explore Transformer Models

All categories Trending Transformer directory Insights