matt-k-wong/mlx-flash
Lightning-fast MLX utilities and optimizations for Apple Silicon
This project enables you to run very large AI models, like those with tens or hundreds of billions of parameters, directly on your Apple Mac, even if it has limited memory. It takes an existing large language model and efficiently streams its components from your Mac's fast storage, allowing you to get immediate text generation or analysis without needing to shrink or alter the model. This is ideal for AI practitioners, researchers, or developers who want to experiment with or deploy large models locally on their Apple Silicon machines.
Use this if you need to run large AI models (30B, 70B, or more parameters) on an Apple Mac with less RAM than the model typically requires, without compromising on precision or quality.
Not ideal if you are working with smaller models that comfortably fit within your Mac's RAM, or if you are using non-Apple hardware.
Stars
18
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/matt-k-wong/mlx-flash"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
Blaizzy/mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...
b4rtaz/distributed-llama
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM...
armbues/SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...
microsoft/batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
armbues/SiLLM-examples
Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on...