etasnadi/VulkanCooperativeMatrixAttention

Vulkan & GLSL implementation of FlashAttention-2

21
/ 100
Experimental

This project provides a highly optimized way for deep learning engineers and researchers to compute attention mechanisms in large language models. It takes in standard attention query, key, and value matrices, and efficiently produces the attention output, making it faster and less memory-intensive than traditional methods. This is ideal for those developing and deploying neural networks on GPUs.

No commits in the last 6 months.

Use this if you are a deep learning engineer or researcher needing to accelerate the attention mechanism in your neural network models, especially when working with large models or limited GPU memory.

Not ideal if you are not working with deep learning models, specifically those using attention mechanisms, or if your hardware does not support Vulkan with cooperative matrices.

deep-learning-inference neural-network-acceleration large-language-models machine-learning-engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

C++

License

Last pushed

Jan 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/etasnadi/VulkanCooperativeMatrixAttention"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.