AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips
This project helps AI developers and engineers deploy large language models (LLMs) and vision-language models (VLMs) efficiently on AXera's AI chips. It takes pre-trained LLM/VLM models and optimizes them to run directly on AX650A/N and AX630C chips, providing a fast way to evaluate model performance and build custom AI applications. The output is a runnable model on AXera hardware, enabling specialized AI assistants for various tasks.
142 stars.
Use this if you are an AI developer or embedded systems engineer working with AXera AI chips and need to deploy large language models or multimodal models for high-performance edge computing.
Not ideal if you are not working with AXera AI chips or if you are looking for a general-purpose LLM inference solution for standard CPU/GPU platforms.
Stars
142
Forks
22
Language
C++
License
BSD-3-Clause
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/AXERA-TECH/ax-llm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AmpereComputingAI/ampere_model_library
AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)