zhenyi4/ssa

Official repository for "SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space"

/ 100

Emerging

This project helps large language model (LLM) developers build more efficient models. It takes existing LLMs or trains new ones to significantly reduce their memory and computational demands. The result is an LLM that runs faster and can handle much longer texts, without losing accuracy or needing frequent adjustments for different text lengths.

Use this if you are developing or deploying large language models and need to improve their efficiency, especially when dealing with very long input texts or operating under strict computational constraints.

Not ideal if you are an end-user of an LLM or simply fine-tuning an existing model for a specific task without needing to modify its core attention mechanism.

LLM-development NLP-infrastructure model-optimization AI-efficiency deep-learning-engineering

No Package No Dependents

Maintenance 13 / 25

Adoption 5 / 25

Maturity 11 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...

thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

fla-org/flame

🔥 A minimal training framework for scaling FLA models

foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...

Explore Transformer Models

All categories Trending Transformer directory Insights