jd-opensource/xllm
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
This project helps businesses and organizations deploy large language models (LLMs) like DeepSeek-V3.1 or Qwen2/3, especially on Chinese AI accelerators. It takes these pre-trained models and makes them run much faster and more cost-effectively, generating text responses for applications like intelligent customer service, risk control, or ad recommendations. The end-users are AI solution architects, MLOps engineers, and IT infrastructure managers responsible for deploying and managing AI applications.
1,081 stars. Actively maintained with 123 commits in the last 30 days.
Use this if you need to run large language models with high efficiency, low latency, and reduced costs on AI accelerators, particularly those from Chinese manufacturers.
Not ideal if you are looking for a tool to train LLMs or if your primary hardware is not an AI accelerator.
Stars
1,081
Forks
149
Language
C++
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
123
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jd-opensource/xllm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...