beehive-lab/GPULlama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
This project helps Java developers integrate powerful large language models (LLMs) like Llama3, Mistral, and Phi-3 directly into their applications. You provide a GGUF format model and a Java application, and it outputs faster text generation and AI-driven features by utilizing GPU hardware. This is for Java developers building AI-powered applications or services who need efficient, local LLM inference.
238 stars.
Use this if you are a Java developer building applications that require fast, local inference from large language models and you have access to NVIDIA GPUs or other OpenCL-compatible hardware.
Not ideal if you are not a Java developer, do not have access to GPU hardware, or need to run models other than those supported (Llama3, Mistral, Qwen, Phi-3, IBM Granite in GGUF format).
Stars
238
Forks
28
Language
Java
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/beehive-lab/GPULlama3.java"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
gitkaz/mlx_gguf_server
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...
srgtuszy/llama-cpp-swift
Swift bindings for llama-cpp library
JackZeng0208/llama.cpp-android-tutorial
llama.cpp tutorial on Android phone
awinml/llama-cpp-python-bindings
Run fast LLM Inference using Llama.cpp in Python
RhinoDevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.