vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

/ 100

Verified

This project helps machine learning engineers and researchers efficiently deploy and run large language models (LLMs) on Huawei Ascend NPUs. It takes popular open-source models (like Transformer-based, MoE, and multi-modal LLMs) and optimizes their execution, providing faster inference and better resource utilization. It's for anyone building or experimenting with LLMs who needs to maximize performance on Ascend hardware.

1,773 stars. Actively maintained with 300 commits in the last 30 days.

Use this if you are deploying or fine-tuning large language models and want to achieve optimal performance on Huawei Ascend NPUs.

Not ideal if you are working with non-LLM machine learning models or primarily use NVIDIA GPUs or other hardware accelerators.

large-language-models machine-learning-deployment AI-inference model-optimization Ascend-NPU

No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

1,773

Forks

912

Language

C++

License

Apache-2.0

Related tools

kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...

sophgo/tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...

BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Explore LLM Tools

All categories Trending LLM Tool directory Insights