sophgo/LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
This project helps integrate and run various generative AI models, primarily large language models (LLMs) and vision-language models (VLMs), on Sophgo's BM1684X and BM1688 AI chips. It takes pre-trained AI models as input and outputs optimized versions that run efficiently on these specialized hardware platforms, enabling on-device AI capabilities. AI application developers, hardware integrators, and embedded system engineers who want to deploy advanced AI models on Sophgo hardware would use this.
271 stars.
Use this if you need to deploy and run sophisticated generative AI models, especially large language models or multimodal models, efficiently on Sophgo BM1684X or BM1688 AI chips.
Not ideal if you are working with AI models that are not primarily generative LLMs or VLMs, or if your target deployment hardware is not a Sophgo BM1684X/BM1688 chip.
Stars
271
Forks
48
Language
C++
License
—
Category
Last pushed
Mar 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/sophgo/LLM-TPU"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency,...
NotPunchnox/rkllama
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning...
Deep-Spark/DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of...
howard-hou/VisualRWKV
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle...
bentoml/llm-inference-handbook
Everything you need to know about LLM inference