Sparse Attention Optimization Transformer Models

There are 26 sparse attention optimization models tracked. 1 score above 70 (verified tier). The highest-rated is fla-org/flash-linear-attention at 76/100 with 4,549 stars. 1 of the top 10 are actively maintained.

Get all 26 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=sparse-attention-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

76
Verified
2 thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves...

57
Established
3 thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that...

54
Established
4 fla-org/flame

🔥 A minimal training framework for scaling FLA models

52
Established
5 foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features,...

52
Established
6 NX-AI/mlstm_kernels

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

44
Emerging
7 skylight-org/sparse-attention-hub

Advancing the frontier of efficient AI

43
Emerging
8 XunhaoLai/native-sparse-attention-triton

Efficient triton implementation of Native Sparse Attention.

41
Emerging
9 NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

40
Emerging
10 NimbleEdge/sparse_transformers

Sparse Inferencing for transformer based LLMs

39
Emerging
11 Infini-AI-Lab/vortex_torch

Vortex: A Flexible and Efficient Sparse Attention Framework

38
Emerging
12 Bruce-Lee-LY/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2...

37
Emerging
13 zhenyi4/ssa

Official repository for "SSA: Sparse Sparse Attention by Aligning Full and...

36
Emerging
14 Relaxed-System-Lab/Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

36
Emerging
15 Bruce-Lee-LY/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using...

35
Emerging
16 egaoharu-kensei/flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with...

34
Emerging
17 jlamprou/Infini-Attention

Efficient Infinite Context Transformers with Infini-attention Pytorch...

27
Experimental
18 nanowell/Q-Sparse-LLM

My Implementation of Q-Sparse: All Large Language Models can be Fully...

26
Experimental
19 wesleyscholl/drex

🦀 The transformer is a brilliant hack scaled past its limits. DREX is what...

23
Experimental
20 AstrolexisAI/MnemoCUDA

Expert streaming inference engine for MoE models larger than VRAM — run...

22
Experimental
21 aymanelrody/FlashMLA

âš¡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse...

22
Experimental
22 NAME0x0/OMNI

PERSPECTIVE v2 — A 1.05 trillion parameter sparse Mixture-of-Experts...

22
Experimental
23 XunhaoLai/ring-sliding-window-attention

Ring sliding window attention implementation with flash attention

22
Experimental
24 kamalrss88/FlashMLA

🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels...

21
Experimental
25 BICLab/MetaLA

Offical implementation of "MetaLA: Unified Optimal Linear Approximation to...

21
Experimental
26 HassanJbara/lin-attn-zoo

Pure PyTorch implementations of popular linear attention models

13
Experimental