kyegomez/FlashMHA

An simple pytorch implementation of Flash MultiHead Attention

/ 100

Emerging

This is a PyTorch library that helps deep learning engineers accelerate their transformer models. It takes in query, key, and value tensors (the building blocks of attention mechanisms) and outputs the processed attention tensor, but much faster than standard methods. It's for machine learning engineers and researchers who are building and training large neural networks, especially those focused on natural language processing or sequence modeling.

No commits in the last 6 months. Available on PyPI.

Use this if you are a deep learning engineer looking to significantly speed up the attention mechanism calculations within your PyTorch-based transformer models on GPUs.

Not ideal if you are not working with PyTorch, do not require accelerated attention mechanisms, or are not building deep learning models.

deep-learning neural-networks transformer-models model-optimization GPU-acceleration

Stale 6m

Maintenance 0 / 25

Adoption 6 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

philipperemy/keras-attention

Keras Attention Layer (Luong and Bahdanau scores).

tatp22/linformer-pytorch

My take on a practical implementation of Linformer for Pytorch.

datalogue/keras-attention

Visualizing RNNs using the attention mechanism

ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is...

thushv89/attention_keras

Keras Layer implementation of Attention for Sequential models

Explore ML Frameworks

All categories Trending ML Framework directory Insights