madroidmaq/mlx-omni-server
MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.
This project helps developers integrate powerful AI capabilities directly into their Mac applications, without sending data to external cloud services. It takes local AI models and provides standard API endpoints, similar to OpenAI or Anthropic, allowing your existing code to run these models on your Apple Silicon chip. This is ideal for developers building Mac-based applications that require secure, high-performance local AI processing.
678 stars. Available on PyPI.
Use this if you are a developer building an application on an Apple Silicon Mac and want to use AI models locally for chat, audio processing, image generation, or embeddings, while maintaining privacy and control.
Not ideal if you need to run AI models on Windows or Linux, or if your application requires the scale and features of cloud-based AI services.
Stars
678
Forks
84
Language
Python
License
MIT
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Dependencies
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/madroidmaq/mlx-omni-server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
NVIDIA-NeMo/Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based...
generative-computing/mellea
Mellea is a library for writing generative programs.
rhesis-ai/rhesis
Open-source platform & SDK for testing LLM and agentic apps. Define expected behavior, generate...
taco-group/OpenEMMA
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.