Dao-AILab/flash-attention

Fast and memory-efficient exact attention

/ 100

Verified

23,131 stars. Used by 18 other packages. Actively maintained with 54 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 21 / 25

Stars

23,131

Forks

2,583

Language

Python

License

BSD-3-Clause

Last pushed

Apr 04, 2026

Commits (30d)

Dependencies

Reverse dependents

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Dao-AILab/flash-attention"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related models

wuwangzhang1216/abliterix

Fully automatic censorship removal for language models. LoRA abliteration + Optuna TPE optimization.

lucidrains/deep-cross-attention

Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch

modelscope/mcore-bridge

MCore-Bridge: Providing Megatron-Core model definitions for state-of-the-art large models and...

assembly-automation-hub/repo-governance

⚙️ Reusable GitHub repository governance kit: CI/CD workflows, CodeQL SAST, Dependabot...

zhongkaifu/TensorSharp

A C# inference engine for running large language models (LLMs) locally using GGUF model files....