xllm and ZhiLight

Both are high-performance LLM inference engines optimized for acceleration, making them competitors in the same category.

xllm

Established

ZhiLight

Established

Maintenance 22/25

Adoption 10/25

Maturity 15/25

Community 22/25

Maintenance 13/25

Adoption 10/25

Maturity 16/25

Community 20/25

Stars: 1,081

Forks: 149

Downloads: —

Commits (30d): 123

Language: C++

License: —

Stars: 905

Forks: 102

Downloads: —

Commits (30d): 4

Language: C++

License: Apache-2.0

No Package No Dependents

About xllm

jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

This project helps businesses and organizations deploy large language models (LLMs) like DeepSeek-V3.1 or Qwen2/3, especially on Chinese AI accelerators. It takes these pre-trained models and makes them run much faster and more cost-effectively, generating text responses for applications like intelligent customer service, risk control, or ad recommendations. The end-users are AI solution architects, MLOps engineers, and IT infrastructure managers responsible for deploying and managing AI applications.

AI-application-deployment large-language-model-inference AI-infrastructure-optimization enterprise-AI-solutions AI-acceleration-hardware

About ZhiLight

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

ZhiLight is a specialized engine designed to speed up the process of generating text from large language models (LLMs) like Llama and its variants. It takes your trained LLM and, by optimizing how the model runs on NVIDIA GPUs, delivers faster responses and more outputs per second. This tool is for AI engineers or machine learning operations specialists who deploy and manage LLMs in production.

LLM deployment AI infrastructure GPU optimization model serving MLOps

Related comparisons

xllm and vllm xllm and MNN xllm and inference xllm and rtp-llm xllm and vllm xllm and rtp-llm

Scores updated daily from GitHub, PyPI, and npm data. How scores work