philipturner/metal-flash-attention

FlashAttention (Metal Port)

40
/ 100
Emerging

This project helps machine learning engineers efficiently train large language models on Apple Silicon. It takes model architecture details and training data, performing the complex 'attention' calculations at the core of these models. The output is a significantly faster training process, especially for the backward pass, tailored for Apple's M-series chips.

589 stars. No commits in the last 6 months.

Use this if you are developing or training large AI models and need to maximize the performance of attention mechanisms on Apple Silicon (M1, M2, M3, M4 chips) to reduce training time and memory usage.

Not ideal if you are working with AI models on non-Apple hardware (like NVIDIA GPUs) or if your model doesn't heavily rely on the attention mechanism.

AI-model-training large-language-models Apple-Silicon-optimization deep-learning-performance transformer-architectures
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

589

Forks

38

Language

Swift

License

MIT

Last pushed

Sep 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/philipturner/metal-flash-attention"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.