xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
This tool helps AI developers and researchers deploy and manage various artificial intelligence models, including large language models (LLMs), speech recognition, and multimodal models. It takes trained AI models and makes them accessible through a unified API, allowing other applications to easily interact with them. Anyone building AI-powered applications, from chatbots to image analysis tools, would use this to put their models into production.
9,129 stars. Actively maintained with 63 commits in the last 30 days. Available on PyPI.
Use this if you need to serve a variety of AI models (language, speech, multimodal) in a production environment, requiring flexible deployment options and a unified API.
Not ideal if you are a non-technical user simply looking to interact with existing AI models without needing to deploy or manage them.
Stars
9,129
Forks
805
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
63
Dependencies
27
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/xorbitsai/inference"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Community Discussion
Recent Releases
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...
tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.