Llm Docker Deployments Transformer Models

There are 20 llm docker deployments models tracked. 2 score above 50 (established tier). The highest-rated is beehive-lab/GPULlama3.java at 51/100 with 238 stars.

Get all 20 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-docker-deployments&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 beehive-lab/GPULlama3.java

GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

51
Established
2 gitkaz/mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or...

50
Established
3 srgtuszy/llama-cpp-swift

Swift bindings for llama-cpp library

44
Emerging
4 JackZeng0208/llama.cpp-android-tutorial

llama.cpp tutorial on Android phone

40
Emerging
5 awinml/llama-cpp-python-bindings

Run fast LLM Inference using Llama.cpp in Python

37
Emerging
6 RhinoDevel/mt_llm

Pure C wrapper library to use llama.cpp with Linux and Windows as simple as...

36
Emerging
7 dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

34
Emerging
8 GURPREETKAURJETHRA/Ollama-UseCases

This repo brings numerous use cases from the Open Source Ollama

34
Emerging
9 lennartpollvogt/ollama-instructor

Python library for the instruction and reliable validation of structured...

33
Emerging
10 AbhinaavRamesh/ollama-local-serve

Local LLM infrastructure for distributed AI applications. Serve...

32
Emerging
11 muhac/llm-actions

Run LLMs for inference in GitHub Actions - add to your workflow!

29
Experimental
12 rookiemann/vllm-windows-build

Native Windows build patches for vLLM v0.14.1 — MSVC 2022 + CUDA 12.6, 26...

26
Experimental
13 nicholasyager/llama-cpp-guidance

A guidance compatibility layer for llama-cpp-python

26
Experimental
14 onidahabitual85/llm-server

Launch and optimize llama.cpp servers automatically across Linux, macOS, and...

23
Experimental
15 thansen0/fastllm.cpp

A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.

23
Experimental
16 rookiemann/llama-cpp-python-py314-cuda131-wheel

GPU-accelerated llama-cpp-python 0.3.16 wheel for Python 3.14 (CUDA 13.1, Windows)

21
Experimental
17 andrewginns/LocalLLM

Configurations for a locally hosted LLM and applications leveraging it

21
Experimental
18 frost-beta/llama2-high-level-cpp

Inference Llama2 with High-Level C++.

21
Experimental
19 abhishekrana/llm-service

RESTful service with LLMs (Large Language Models) running locally

17
Experimental
20 caiomadeira/llama2-psp

Llama 2 inference in C on the PlayStation Portable (PSP).

15
Experimental