etasnadi/VulkanCooperativeMatrixAttention

Vulkan & GLSL implementation of FlashAttention-2

/ 100

Experimental

This project provides a highly optimized way for deep learning engineers and researchers to compute attention mechanisms in large language models. It takes in standard attention query, key, and value matrices, and efficiently produces the attention output, making it faster and less memory-intensive than traditional methods. This is ideal for those developing and deploying neural networks on GPUs.

No commits in the last 6 months.

Use this if you are a deep learning engineer or researcher needing to accelerate the attention mechanism in your neural network models, especially when working with large models or limited GPU memory.

Not ideal if you are not working with deep learning models, specifically those using attention mechanisms, or if your hardware does not support Vulkan with cooperative matrices.

deep-learning-inference neural-network-acceleration large-language-models machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

C++

License

—

Higher-rated alternatives

windreamer/flash-attention3-wheels

Pre-built wheels that erase Flash Attention 3 installation headaches.

Explore LLM Tools

All categories Trending LLM Tool directory Insights