egaoharu-kensei/flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

34
/ 100
Emerging

This project helps machine learning engineers and researchers accelerate their large language model training by speeding up the 'attention' mechanism. It takes in query, key, and value tensors from your model and outputs faster, more efficient attention computations. The primary users are those working on deep learning models, especially large transformers, who need to optimize performance on NVIDIA GPUs.

Available on PyPI.

Use this if you are training large language models or other transformer-based models and need to significantly boost the speed of your attention calculations on NVIDIA GPUs (Turing or newer architectures).

Not ideal if you are not working with deep learning models that use the attention mechanism or if you are not using compatible NVIDIA GPUs.

deep-learning large-language-models transformer-architecture gpu-optimization model-training
Maintenance 6 / 25
Adoption 6 / 25
Maturity 22 / 25
Community 0 / 25

How are scores calculated?

Stars

21

Forks

Language

Python

License

MIT

Last pushed

Jan 12, 2026

Commits (30d)

0

Dependencies

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/egaoharu-kensei/flash-attention-triton"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.