matt-k-wong/mlx-flash

Lightning-fast MLX utilities and optimizations for Apple Silicon

/ 100

Experimental

This project enables you to run very large AI models, like those with tens or hundreds of billions of parameters, directly on your Apple Mac, even if it has limited memory. It takes an existing large language model and efficiently streams its components from your Mac's fast storage, allowing you to get immediate text generation or analysis without needing to shrink or alter the model. This is ideal for AI practitioners, researchers, or developers who want to experiment with or deploy large models locally on their Apple Silicon machines.

Use this if you need to run large AI models (30B, 70B, or more parameters) on an Apple Mac with less RAM than the model typically requires, without compromising on precision or quality.

Not ideal if you are working with smaller models that comfortably fit within your Mac's RAM, or if you are using non-Apple hardware.

large-language-models on-device-ai ai-model-deployment apple-silicon-ml ml-research

No Package No Dependents

Maintenance 13 / 25

Adoption 6 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Compare

mlx-flash and mlx-vlm

Higher-rated alternatives

Blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...

b4rtaz/distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM...

armbues/SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...

microsoft/batch-inference

Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.

armbues/SiLLM-examples

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on...

Explore Transformer Models

All categories Trending Transformer directory Insights