JackZeng0208/llama.cpp-android-tutorial
llama.cpp tutorial on Android phone
This project guides developers through the process of setting up and running large language models (LLMs) like Llama directly on Android phones equipped with Qualcomm Snapdragon processors. It shows how to compile the `llama.cpp` library to leverage the phone's Adreno GPU for faster processing. The output is a functional `llama.cpp` application, potentially integrated with Python, capable of performing local LLM inference on the device. This is for Android app developers or researchers interested in deploying and evaluating LLMs on mobile hardware.
155 stars. No commits in the last 6 months.
Use this if you are a developer looking to deploy and run large language models directly on an Android device with a Qualcomm Snapdragon SoC, utilizing its Adreno GPU for accelerated performance.
Not ideal if you are a general user wanting a ready-to-use LLM app, or if your Android device does not have a Qualcomm Snapdragon processor with an Adreno GPU.
Stars
155
Forks
12
Language
—
License
MIT
Category
Last pushed
May 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JackZeng0208/llama.cpp-android-tutorial"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
beehive-lab/GPULlama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
gitkaz/mlx_gguf_server
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...
srgtuszy/llama-cpp-swift
Swift bindings for llama-cpp library
awinml/llama-cpp-python-bindings
Run fast LLM Inference using Llama.cpp in Python
RhinoDevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.