XunhaoLai/ring-sliding-window-attention
Ring sliding window attention implementation with flash attention
This is a specialized tool for machine learning engineers working on large language models. It helps train models more efficiently on very long text sequences by distributing the attention mechanism across multiple GPUs. You input the model's query, key, and value tensors, and it outputs the attention results, enabling faster training for long contexts.
No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher training large language models with very long input sequences and need to leverage multiple GPUs for efficient computation.
Not ideal if you are working with shorter text sequences, or if you are not using a distributed training setup with multiple GPUs.
Stars
9
Forks
—
Language
Python
License
MIT
Category
Last pushed
Jul 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/XunhaoLai/ring-sliding-window-attention"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
fla-org/flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...
thu-ml/SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
fla-org/flame
🔥 A minimal training framework for scaling FLA models
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...