vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
This project helps machine learning engineers and researchers efficiently deploy and run large language models (LLMs) on Huawei Ascend NPUs. It takes popular open-source models (like Transformer-based, MoE, and multi-modal LLMs) and optimizes their execution, providing faster inference and better resource utilization. It's for anyone building or experimenting with LLMs who needs to maximize performance on Ascend hardware.
1,773 stars. Actively maintained with 300 commits in the last 30 days.
Use this if you are deploying or fine-tuning large language models and want to achieve optimal performance on Huawei Ascend NPUs.
Not ideal if you are working with non-LLM machine learning models or primarily use NVIDIA GPUs or other hardware accelerators.
Stars
1,773
Forks
912
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
300
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/vllm-project/vllm-ascend"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
SemiAnalysisAI/InferenceX
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
uccl-project/uccl
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.