waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

58
/ 100
Established

This project helps developers and engineers working with AI applications to run large language models and vision-language models on their Apple Silicon Macs much faster. It takes various inputs like text, images, videos, or audio, processes them using different AI models, and produces outputs such as generated text, image descriptions, audio transcriptions, or embeddings. It's designed for anyone building or experimenting with AI solutions who needs to deploy models locally on Apple hardware.

579 stars. Actively maintained with 58 commits in the last 30 days.

Use this if you are a developer or AI engineer building applications that use large language models or multimodal AI and want to run them efficiently and quickly on your Apple Silicon Mac.

Not ideal if you don't have an Apple Silicon Mac, or if you're a casual user looking for a pre-packaged consumer application rather than a developer tool.

AI-development machine-learning-engineering LLM-deployment multimodal-AI Apple-Silicon-optimization
No License No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 5 / 25
Community 21 / 25

How are scores calculated?

Stars

579

Forks

87

Language

Python

License

Last pushed

Mar 12, 2026

Commits (30d)

58

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/waybarrios/vllm-mlx"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.