Apple Silicon LLM Inference LLM Tools

Tools and frameworks for optimizing LLM inference, training, and deployment specifically on Apple Silicon (M1/M2/M3) using MLX framework. Includes server implementations, UI wrappers, and performance optimization utilities. Does NOT include general LLM frameworks, non-Apple-specific inference servers, or tools without native MLX/Metal support.

There are 42 apple silicon llm inference tools tracked. 6 score above 50 (established tier). The highest-rated is jundot/omlx at 62/100 with 4,057 stars. 4 of the top 10 are actively maintained.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=apple-silicon-llm-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	jundot/omlx LLM inference server with continuous batching & SSD caching for Apple...	62	Established	4,057	Python
2	josStorer/RWKV-Runner A RWKV management and startup tool, full automation, only 8MB. And provides...	59	Established	6,256	TypeScript
3	waybarrios/vllm-mlx OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and...	58	Established	579	Python
4	jordanhubbard/nanolang A tiny experimental language designed to be targeted by coding LLMs	58	Established	573	C
5	akivasolutions/tightwad Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative...	52	Established	4	Python
6	petrukha-ivan/mlx-swift-structured Structured output generation in Swift	51	Established	65	Swift
7	parasail-ai/openai-batch Make OpenAI batch easy to use.	48	Emerging	9	Python
8	mit-han-lab/TinyChatEngine TinyChatEngine: On-Device LLM Inference Library	45	Emerging	944	C++
9	da-z/mlx-ui A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.	43	Emerging	262	Python
10	icppWorld/icgpt on-chain LLMs for the Internet Computer	41	Emerging	17	Python
11	eelbaz/dgx-spark-vllm-setup One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs...	41	Emerging	71	Shell
12	OpenLMLab/MOSS_Vortex Moss Vortex is a lightweight and high-performance deployment and inference...	40	Emerging	37	Python
13	Sub-Soft/Siliv MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation	38	Emerging	253	Python
14	uncSoft/anubis-oss Local LLM Testing & Benchmarking for Apple Silicon	37	Emerging	68	Swift
15	makit/makit-llm-lambda Example showing how to run a LLM fully inside an AWS Lambda Function	36	Emerging	23	Dockerfile
16	druide67/asiai Multi-engine LLM benchmark & monitoring CLI for Apple Silicon	35	Emerging	2	Python
17	N1k1tung/infer-ring Infer Ring is an iOS and macOS app that facilitates cross-device LLM...	34	Emerging	9	Swift
18	seasonjs/rwkv pure go for rwkv	34	Emerging	19	Go
19	Mizistein/omlx 🤖 Optimize LLM inference on Mac with continuous batching and SSD caching...	33	Emerging	5	Python
20	unit-mesh/edge-infer EdgeInfer enables efficient edge intelligence by running small AI models,...	31	Emerging	50	Rust
21	ziozzang/Mac_mlx_phi-2_server Test server code for Phi-2 model. support OpenAI API spec	31	Emerging	18	Python
22	GusLovesMath/Local_LLM_Training_Apple_Silicon Created and enhanced a local LLM training system on Apple Silicon with MLX...	30	Emerging	26	Python
23	AI-DarwinLabs/vllm-hpc-installer 🚀 Automated installation script for vLLM on HPC systems with ROCm support,...	30	Emerging	2	Shell
24	koji/llm_api_template API template for LLM model with llama.cpp	24	Experimental	4	Jupyter Notebook
25	leszkolukasz/moondream-cpp Moondream VLLM for C++/Qt	23	Experimental	3	C++
26	ndluna21/nanochat-ascend Run nanochat training efficiently on Huawei Ascend NPUs with minimal code...	22	Experimental	—	Python
27	fabriziosalmi/silicondev Local LLM fine-tuning and chat for Apple Silicon	22	Experimental	—	TypeScript
28	vivekptnk/tinybrain Swift-native on-device LLM inference with live transformer visualization (X-Ray Mode)	22	Experimental	—	Swift
29	deeflect/mcclaw Find which local LLMs actually run on your Mac. 340+ models, hardware-aware...	21	Experimental	13	—
30	arunsanna/tauri-plugin-mlx Tauri v2 plugin for local LLM inference on Apple Silicon using Apple MLX —...	21	Experimental	—	Rust
31	countzero/windows_manage_large_language_models PowerShell automation to download large language models (LLMs) from Git...	21	Experimental	3	PowerShell
32	jballo/VALLM VALLM (Vision Assisted Large Language Model) is a web application that helps...	21	Experimental	—	TypeScript
33	Feyerabend/cc From Code to Computation: A Modern Guide to Programming and Theory	20	Experimental	3	C
34	fiveoutofnine/whatcanirun Find the best models and how to run them locally.	18	Experimental	2	TypeScript
35	vaccovecrana/rwkv.jni JNI wrapper for rwkv.cpp	18	Experimental	2	Java
36	jeorgexyz/lua-llama Pure Lua implementation of LLaMA inference - educational project exploring...	17	Experimental	—	Lua
37	GetNyrex/strix-halo-guide Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87...	14	Experimental	—	Shell
38	StefanoChiodino/mlx-manager Sugar coating on the extremely performant but not very user friendly MLX	14	Experimental	—	Swift
39	GabrielNetoAUT/tps.sh Benchmark local and cloud large language models on Apple Silicon by...	13	Experimental	—	Python
40	dev4any1/hyper-stack-4j Distributed Java-native LLM Inference Engine — commodity CPU/GPU cluster	13	Experimental	—	Java
41	GusGitMath/Llama3_MacSilicon Repository for running LLMs efficiently on Mac silicon (M1, M2, M3)....	11	Experimental	3	Jupyter Notebook
42	WilliamK112/llm-fit Can my laptop run this model? Instant local LLM fit + speed estimator.	11	Experimental	—	JavaScript

Comparisons in this category

omlx and vllm-mlx (62 vs 58) omlx and asiai (62 vs 35) vllm-mlx and omlx (58 vs 33)