fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

/ 100

Verified

This project offers highly optimized building blocks for developing next-generation AI models that can process very long sequences of information efficiently. It provides ready-to-use implementations of advanced 'linear attention' and 'state space' model architectures. AI researchers and machine learning engineers can use these components to create more powerful and scalable models for tasks like natural language understanding or time-series prediction.

4,549 stars. Used by 1 other package. Actively maintained with 29 commits in the last 30 days. Available on PyPI.

Use this if you are a machine learning researcher or engineer building large language models or other sequence models and need highly optimized components to process long data sequences more efficiently.

Not ideal if you are looking for a complete, end-user application or a no-code solution for general-purpose AI tasks.

AI-model-development large-language-models sequence-modeling deep-learning-optimization AI-research

Maintenance 20 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

4,549

Forks

431

Language

Python

License

MIT

Compare

flash-linear-attention and SageAttention flash-linear-attention and flame flash-linear-attention and Star-Attention flash-linear-attention and flash_attention_inference flash-linear-attention and Flash-Sparse-Attention flash-linear-attention and ring-sliding-window-attention

Related models

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...

thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

fla-org/flame

🔥 A minimal training framework for scaling FLA models

foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...

NX-AI/mlstm_kernels

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

Explore Transformer Models

All categories Trending Transformer directory Insights