softmax1/Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
This project offers efficient and numerically stable implementations of the `softmax_n` attention mechanism for transformer models. It takes your existing transformer model code or a pre-trained model as input and replaces the standard softmax with `softmax_n`, outputting a modified model that potentially has fewer activation and weight outliers. Machine learning engineers and researchers working on large language models and other transformer-based architectures would use this.
No commits in the last 6 months. Available on PyPI.
Use this if you are developing or fine-tuning transformer models and want to experiment with `softmax_n` to improve numerical stability or reduce outliers in model activations and weights.
Not ideal if you are not working with transformer models or if you need to use specific GPU features that are not supported by the Triton implementation, such as certain dropout or attention mask configurations with real-valued `n`.
Stars
73
Forks
5
Language
Python
License
GPL-3.0
Category
Last pushed
May 26, 2024
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/softmax1/Flash-Attention-Softmax-N"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features...
kanishkamisra/minicons
Utility for behavioral and representational analyses of Language Models
lucidrains/simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
lucidrains/dreamer4
Implementation of Danijar's latest iteration for his Dreamer line of work
Nicolepcx/Transformers-in-Action
This is the corresponding code for the book Transformers in Action