matt-k-wong/mlx-flash

Lightning-fast MLX utilities and optimizations for Apple Silicon

28
/ 100
Experimental

This project enables you to run very large AI models, like those with tens or hundreds of billions of parameters, directly on your Apple Mac, even if it has limited memory. It takes an existing large language model and efficiently streams its components from your Mac's fast storage, allowing you to get immediate text generation or analysis without needing to shrink or alter the model. This is ideal for AI practitioners, researchers, or developers who want to experiment with or deploy large models locally on their Apple Silicon machines.

Use this if you need to run large AI models (30B, 70B, or more parameters) on an Apple Mac with less RAM than the model typically requires, without compromising on precision or quality.

Not ideal if you are working with smaller models that comfortably fit within your Mac's RAM, or if you are using non-Apple hardware.

large-language-models on-device-ai ai-model-deployment apple-silicon-ml ml-research
No Package No Dependents
Maintenance 13 / 25
Adoption 6 / 25
Maturity 9 / 25
Community 0 / 25

How are scores calculated?

Stars

18

Forks

Language

Python

License

MIT

Last pushed

Mar 21, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/matt-k-wong/mlx-flash"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.