waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
This project helps developers and engineers working with AI applications to run large language models and vision-language models on their Apple Silicon Macs much faster. It takes various inputs like text, images, videos, or audio, processes them using different AI models, and produces outputs such as generated text, image descriptions, audio transcriptions, or embeddings. It's designed for anyone building or experimenting with AI solutions who needs to deploy models locally on Apple hardware.
579 stars. Actively maintained with 58 commits in the last 30 days.
Use this if you are a developer or AI engineer building applications that use large language models or multimodal AI and want to run them efficiently and quickly on your Apple Silicon Mac.
Not ideal if you don't have an Apple Silicon Mac, or if you're a casual user looking for a pre-packaged consumer application rather than a developer tool.
Stars
579
Forks
87
Language
Python
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
58
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/waybarrios/vllm-mlx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides an interface...
jordanhubbard/nanolang
A tiny experimental language designed to be targeted by coding LLMs
akivasolutions/tightwad
Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...
petrukha-ivan/mlx-swift-structured
Structured output generation in Swift