beehive-lab/GPULlama3.java

GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

/ 100

Established

This project helps Java developers integrate powerful large language models (LLMs) like Llama3, Mistral, and Phi-3 directly into their applications. You provide a GGUF format model and a Java application, and it outputs faster text generation and AI-driven features by utilizing GPU hardware. This is for Java developers building AI-powered applications or services who need efficient, local LLM inference.

238 stars.

Use this if you are a Java developer building applications that require fast, local inference from large language models and you have access to NVIDIA GPUs or other OpenCL-compatible hardware.

Not ideal if you are not a Java developer, do not have access to GPU hardware, or need to run models other than those supported (Llama3, Mistral, Qwen, Phi-3, IBM Granite in GGUF format).

Java development AI application building Large Language Models Local inference GPU acceleration

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 16 / 25

How are scores calculated?

Stars

238

Forks

Language

Java

License

MIT

Related models

gitkaz/mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...

srgtuszy/llama-cpp-swift

Swift bindings for llama-cpp library

JackZeng0208/llama.cpp-android-tutorial

llama.cpp tutorial on Android phone

awinml/llama-cpp-python-bindings

Run fast LLM Inference using Llama.cpp in Python

RhinoDevel/mt_llm

Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.

Explore Transformer Models

All categories Trending Transformer directory Insights