Apple Silicon LLM Inference LLM Tools

Tools and frameworks for optimizing LLM inference, training, and deployment specifically on Apple Silicon (M1/M2/M3) using MLX framework. Includes server implementations, UI wrappers, and performance optimization utilities. Does NOT include general LLM frameworks, non-Apple-specific inference servers, or tools without native MLX/Metal support.

There are 42 apple silicon llm inference tools tracked. 6 score above 50 (established tier). The highest-rated is jundot/omlx at 62/100 with 4,057 stars. 4 of the top 10 are actively maintained.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=apple-silicon-llm-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple...

62
Established
2 josStorer/RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides...

59
Established
3 waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and...

58
Established
4 jordanhubbard/nanolang

A tiny experimental language designed to be targeted by coding LLMs

58
Established
5 akivasolutions/tightwad

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative...

52
Established
6 petrukha-ivan/mlx-swift-structured

Structured output generation in Swift

51
Established
7 parasail-ai/openai-batch

Make OpenAI batch easy to use.

48
Emerging
8 mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

45
Emerging
9 da-z/mlx-ui

A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.

43
Emerging
10 icppWorld/icgpt

on-chain LLMs for the Internet Computer

41
Emerging
11 eelbaz/dgx-spark-vllm-setup

One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs...

41
Emerging
12 OpenLMLab/MOSS_Vortex

Moss Vortex is a lightweight and high-performance deployment and inference...

40
Emerging
13 Sub-Soft/Siliv

MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation

38
Emerging
14 uncSoft/anubis-oss

Local LLM Testing & Benchmarking for Apple Silicon

37
Emerging
15 makit/makit-llm-lambda

Example showing how to run a LLM fully inside an AWS Lambda Function

36
Emerging
16 druide67/asiai

Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

35
Emerging
17 N1k1tung/infer-ring

Infer Ring is an iOS and macOS app that facilitates cross-device LLM...

34
Emerging
18 seasonjs/rwkv

pure go for rwkv

34
Emerging
19 Mizistein/omlx

🤖 Optimize LLM inference on Mac with continuous batching and SSD caching...

33
Emerging
20 unit-mesh/edge-infer

EdgeInfer enables efficient edge intelligence by running small AI models,...

31
Emerging
21 ziozzang/Mac_mlx_phi-2_server

Test server code for Phi-2 model. support OpenAI API spec

31
Emerging
22 GusLovesMath/Local_LLM_Training_Apple_Silicon

Created and enhanced a local LLM training system on Apple Silicon with MLX...

30
Emerging
23 AI-DarwinLabs/vllm-hpc-installer

🚀 Automated installation script for vLLM on HPC systems with ROCm support,...

30
Emerging
24 koji/llm_api_template

API template for LLM model with llama.cpp

24
Experimental
25 leszkolukasz/moondream-cpp

Moondream VLLM for C++/Qt

23
Experimental
26 ndluna21/nanochat-ascend

Run nanochat training efficiently on Huawei Ascend NPUs with minimal code...

22
Experimental
27 fabriziosalmi/silicondev

Local LLM fine-tuning and chat for Apple Silicon

22
Experimental
28 vivekptnk/tinybrain

Swift-native on-device LLM inference with live transformer visualization (X-Ray Mode)

22
Experimental
29 deeflect/mcclaw

Find which local LLMs actually run on your Mac. 340+ models, hardware-aware...

21
Experimental
30 arunsanna/tauri-plugin-mlx

Tauri v2 plugin for local LLM inference on Apple Silicon using Apple MLX —...

21
Experimental
31 countzero/windows_manage_large_language_models

PowerShell automation to download large language models (LLMs) from Git...

21
Experimental
32 jballo/VALLM

VALLM (Vision Assisted Large Language Model) is a web application that helps...

21
Experimental
33 Feyerabend/cc

From Code to Computation: A Modern Guide to Programming and Theory

20
Experimental
34 fiveoutofnine/whatcanirun

Find the best models and how to run them locally.

18
Experimental
35 vaccovecrana/rwkv.jni

JNI wrapper for rwkv.cpp

18
Experimental
36 jeorgexyz/lua-llama

Pure Lua implementation of LLaMA inference - educational project exploring...

17
Experimental
37 GetNyrex/strix-halo-guide

Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87...

14
Experimental
38 StefanoChiodino/mlx-manager

Sugar coating on the extremely performant but not very user friendly MLX

14
Experimental
39 GabrielNetoAUT/tps.sh

Benchmark local and cloud large language models on Apple Silicon by...

13
Experimental
40 dev4any1/hyper-stack-4j

Distributed Java-native LLM Inference Engine — commodity CPU/GPU cluster

13
Experimental
41 GusGitMath/Llama3_MacSilicon

Repository for running LLMs efficiently on Mac silicon (M1, M2, M3)....

11
Experimental
42 WilliamK112/llm-fit

Can my laptop run this model? Instant local LLM fit + speed estimator.

11
Experimental