vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

73
/ 100
Verified

This project helps machine learning engineers and researchers efficiently deploy and run large language models (LLMs) on Huawei Ascend NPUs. It takes popular open-source models (like Transformer-based, MoE, and multi-modal LLMs) and optimizes their execution, providing faster inference and better resource utilization. It's for anyone building or experimenting with LLMs who needs to maximize performance on Ascend hardware.

1,773 stars. Actively maintained with 300 commits in the last 30 days.

Use this if you are deploying or fine-tuning large language models and want to achieve optimal performance on Huawei Ascend NPUs.

Not ideal if you are working with non-LLM machine learning models or primarily use NVIDIA GPUs or other hardware accelerators.

large-language-models machine-learning-deployment AI-inference model-optimization Ascend-NPU
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

1,773

Forks

912

Language

C++

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

300

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/vllm-project/vllm-ascend"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.