beehive-lab/GPULlama3.java

GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

51
/ 100
Established

This project helps Java developers integrate powerful large language models (LLMs) like Llama3, Mistral, and Phi-3 directly into their applications. You provide a GGUF format model and a Java application, and it outputs faster text generation and AI-driven features by utilizing GPU hardware. This is for Java developers building AI-powered applications or services who need efficient, local LLM inference.

238 stars.

Use this if you are a Java developer building applications that require fast, local inference from large language models and you have access to NVIDIA GPUs or other OpenCL-compatible hardware.

Not ideal if you are not a Java developer, do not have access to GPU hardware, or need to run models other than those supported (Llama3, Mistral, Qwen, Phi-3, IBM Granite in GGUF format).

Java development AI application building Large Language Models Local inference GPU acceleration
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 16 / 25

How are scores calculated?

Stars

238

Forks

28

Language

Java

License

MIT

Last pushed

Mar 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/beehive-lab/GPULlama3.java"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.