thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

54
/ 100
Established

This is a tool for developers working with large language, image, or video models. It replaces a standard component in these models (the attention mechanism) with a faster, more efficient version. By taking your model's internal data (queries, keys, values) as input, it produces the same attention output but significantly quicker, without needing to re-train your model. This is for machine learning engineers and researchers aiming to accelerate model inference.

956 stars.

Use this if you are a developer looking to speed up the inference time of your existing large models (like those for generating text, images, or video) without complex retraining.

Not ideal if you are a non-technical end-user or if you need to optimize the training phase of your models, as this tool focuses on inference acceleration.

deep-learning-inference large-language-models computer-vision-models video-generation model-optimization
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

956

Forks

87

Language

Cuda

License

Apache-2.0

Last pushed

Feb 25, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/thu-ml/SpargeAttn"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.