jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

69
/ 100
Established

This project helps businesses and organizations deploy large language models (LLMs) like DeepSeek-V3.1 or Qwen2/3, especially on Chinese AI accelerators. It takes these pre-trained models and makes them run much faster and more cost-effectively, generating text responses for applications like intelligent customer service, risk control, or ad recommendations. The end-users are AI solution architects, MLOps engineers, and IT infrastructure managers responsible for deploying and managing AI applications.

1,081 stars. Actively maintained with 123 commits in the last 30 days.

Use this if you need to run large language models with high efficiency, low latency, and reduced costs on AI accelerators, particularly those from Chinese manufacturers.

Not ideal if you are looking for a tool to train LLMs or if your primary hardware is not an AI accelerator.

AI-application-deployment large-language-model-inference AI-infrastructure-optimization enterprise-AI-solutions AI-acceleration-hardware
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 22 / 25

How are scores calculated?

Stars

1,081

Forks

149

Language

C++

License

Last pushed

Mar 13, 2026

Commits (30d)

123

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jd-opensource/xllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.