softmax1/Flash-Attention-Softmax-N

CUDA and Triton implementations of Flash Attention with SoftmaxN.

/ 100

Emerging

This project offers efficient and numerically stable implementations of the `softmax_n` attention mechanism for transformer models. It takes your existing transformer model code or a pre-trained model as input and replaces the standard softmax with `softmax_n`, outputting a modified model that potentially has fewer activation and weight outliers. Machine learning engineers and researchers working on large language models and other transformer-based architectures would use this.

No commits in the last 6 months. Available on PyPI.

Use this if you are developing or fine-tuning transformer models and want to experiment with `softmax_n` to improve numerical stability or reduce outliers in model activations and weights.

Not ideal if you are not working with transformer models or if you need to use specific GPU features that are not supported by the Triton implementation, such as certain dropout or attention mask configurations with real-valued `n`.

transformer-models deep-learning model-optimization neural-networks machine-learning-research

Stale 6m

Maintenance 0 / 25

Adoption 9 / 25

Maturity 25 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Higher-rated alternatives

lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features...

kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

Nicolepcx/Transformers-in-Action

This is the corresponding code for the book Transformers in Action

Explore Transformer Models

All categories Trending Transformer directory Insights