philipturner/metal-flash-attention
FlashAttention (Metal Port)
This project helps machine learning engineers efficiently train large language models on Apple Silicon. It takes model architecture details and training data, performing the complex 'attention' calculations at the core of these models. The output is a significantly faster training process, especially for the backward pass, tailored for Apple's M-series chips.
589 stars. No commits in the last 6 months.
Use this if you are developing or training large AI models and need to maximize the performance of attention mechanisms on Apple Silicon (M1, M2, M3, M4 chips) to reduce training time and memory usage.
Not ideal if you are working with AI models on non-Apple hardware (like NVIDIA GPUs) or if your model doesn't heavily rely on the attention mechanism.
Stars
589
Forks
38
Language
Swift
License
MIT
Category
Last pushed
Sep 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/philipturner/metal-flash-attention"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
gpu-mode/Triton-Puzzles
Puzzles for learning Triton
hailo-ai/hailo_model_zoo
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
hyperai/tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档