kyegomez/FlashMHA
An simple pytorch implementation of Flash MultiHead Attention
This is a PyTorch library that helps deep learning engineers accelerate their transformer models. It takes in query, key, and value tensors (the building blocks of attention mechanisms) and outputs the processed attention tensor, but much faster than standard methods. It's for machine learning engineers and researchers who are building and training large neural networks, especially those focused on natural language processing or sequence modeling.
No commits in the last 6 months. Available on PyPI.
Use this if you are a deep learning engineer looking to significantly speed up the attention mechanism calculations within your PyTorch-based transformer models on GPUs.
Not ideal if you are not working with PyTorch, do not require accelerated attention mechanisms, or are not building deep learning models.
Stars
22
Forks
4
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Feb 05, 2024
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/kyegomez/FlashMHA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
philipperemy/keras-attention
Keras Attention Layer (Luong and Bahdanau scores).
tatp22/linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
datalogue/keras-attention
Visualizing RNNs using the attention mechanism
ematvey/hierarchical-attention-networks
Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is...
thushv89/attention_keras
Keras Layer implementation of Attention for Sequential models