zhenyi4/ssa
Official repository for "SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space"
This project helps large language model (LLM) developers build more efficient models. It takes existing LLMs or trains new ones to significantly reduce their memory and computational demands. The result is an LLM that runs faster and can handle much longer texts, without losing accuracy or needing frequent adjustments for different text lengths.
Use this if you are developing or deploying large language models and need to improve their efficiency, especially when dealing with very long input texts or operating under strict computational constraints.
Not ideal if you are an end-user of an LLM or simply fine-tuning an existing model for a specific task without needing to modify its core attention mechanism.
Stars
10
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 27, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zhenyi4/ssa"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
fla-org/flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...
thu-ml/SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
fla-org/flame
🔥 A minimal training framework for scaling FLA models
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...