omlx and vllm-mlx

These are competitors offering overlapping inference server capabilities (continuous batching, Apple Silicon optimization via MLX) with different feature trade-offs—omlx emphasizes macOS integration while vllm-mlx prioritizes OpenAI API compatibility and multimodal model support.

omlx
62
Established
vllm-mlx
58
Established
Maintenance 22/25
Adoption 10/25
Maturity 11/25
Community 19/25
Maintenance 22/25
Adoption 10/25
Maturity 5/25
Community 21/25
Stars: 4,057
Forks: 306
Downloads:
Commits (30d): 448
Language: Python
License: Apache-2.0
Stars: 579
Forks: 87
Downloads:
Commits (30d): 58
Language: Python
License:
No Package No Dependents
No License No Package No Dependents

About omlx

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

oMLX helps individual developers and power users on Apple Silicon Macs efficiently run and manage large language models (LLMs) and vision-language models (VLMs) directly on their machines. It takes a model file and provides a local API endpoint and a web dashboard, allowing you to interact with models for tasks like code generation, content creation, or image analysis. This is for developers or technical users who want to run powerful AI models locally without relying on cloud services.

local-AI-inference Apple-Silicon-ML LLM-deployment VLM-applications developer-tools

About vllm-mlx

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

This project helps developers and engineers working with AI applications to run large language models and vision-language models on their Apple Silicon Macs much faster. It takes various inputs like text, images, videos, or audio, processes them using different AI models, and produces outputs such as generated text, image descriptions, audio transcriptions, or embeddings. It's designed for anyone building or experimenting with AI solutions who needs to deploy models locally on Apple hardware.

AI-development machine-learning-engineering LLM-deployment multimodal-AI Apple-Silicon-optimization

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work