jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
oMLX helps individual developers and power users on Apple Silicon Macs efficiently run and manage large language models (LLMs) and vision-language models (VLMs) directly on their machines. It takes a model file and provides a local API endpoint and a web dashboard, allowing you to interact with models for tasks like code generation, content creation, or image analysis. This is for developers or technical users who want to run powerful AI models locally without relying on cloud services.
4,057 stars. Actively maintained with 448 commits in the last 30 days.
Use this if you are a developer or AI enthusiast using an Apple Silicon Mac and want to run multiple large language models or vision models locally with optimal performance and easy management.
Not ideal if you need to deploy AI models on non-Apple hardware, prefer cloud-based inference, or do not have a technical background.
Stars
4,057
Forks
306
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
448
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jundot/omlx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Recent Releases
Compare
Related tools
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides an interface...
waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models...
jordanhubbard/nanolang
A tiny experimental language designed to be targeted by coding LLMs
akivasolutions/tightwad
Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...
petrukha-ivan/mlx-swift-structured
Structured output generation in Swift