Attention Mechanism Implementations ML Frameworks

Implementations and tutorials of attention layers, attention mechanisms, and self-attention architectures for neural networks. Does NOT include broader transformer architectures, vision models, or applications that use attention as a component without focusing on the mechanism itself.

There are 82 attention mechanism implementations frameworks tracked. 5 score above 50 (established tier). The highest-rated is philipperemy/keras-attention at 61/100 with 2,815 stars.

Get all 82 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=attention-mechanism-implementations&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 philipperemy/keras-attention

Keras Attention Layer (Luong and Bahdanau scores).

61
Established
2 tatp22/linformer-pytorch

My take on a practical implementation of Linformer for Pytorch.

51
Established
3 datalogue/keras-attention

Visualizing RNNs using the attention mechanism

51
Established
4 ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow....

51
Established
5 thushv89/attention_keras

Keras Layer implementation of Attention for Sequential models

51
Established
6 davidmascharka/tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between...

49
Emerging
7 soskek/attention_is_all_you_need

Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.

49
Emerging
8 lucidrains/fast-weight-attention

Implementation of Fast Weight Attention

48
Emerging
9 balavenkatesh3322/CV-pretrained-model

A collection of computer vision pre-trained models.

48
Emerging
10 brandokoch/attention-is-all-you-need-paper

Original transformer paper: Implementation of Vaswani, Ashish, et al....

48
Emerging
11 willGuimont/learnable_fourier_positional_encoding

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

48
Emerging
12 kushalj001/pytorch-question-answering

Important paper implementations for Question Answering using PyTorch

47
Emerging
13 tlatkowski/multihead-siamese-nets

Implementation of Siamese Neural Networks built upon multihead attention...

47
Emerging
14 kyegomez/FlashMHA

An simple pytorch implementation of Flash MultiHead Attention

45
Emerging
15 tensorflow/similarity

TensorFlow Similarity is a python package focused on making similarity...

45
Emerging
16 Ugenteraan/Deep_Hierarchical_Classification

PyTorch Implementation of Deep Hierarchical Classification for Category...

44
Emerging
17 rockerBOO/lora-inspector

LoRA (Low-Rank Adaptation) inspector for Stable Diffusion

44
Emerging
18 lsdefine/attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need

43
Emerging
19 Zhenye-Na/DA-RNN

📃 𝖀𝖓𝖔𝖋𝖋𝖎𝖈𝖎𝖆𝖑 PyTorch Implementation of DA-RNN (arXiv:1704.02971)

43
Emerging
20 macournoyer/neuralconvo

Neural conversational model in Torch

43
Emerging
21 opengeos/earthformer

A Python package for Earth forecasting transformer

43
Emerging
22 EdGENetworks/attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch

43
Emerging
23 szagoruyko/attention-transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

42
Emerging
24 poloclub/dodrio

Exploring attention weights in transformer-based models with linguistic knowledge.

42
Emerging
25 rentainhe/visualization

a collection of visualization function

42
Emerging
26 cbaziotis/neat-vision

Neat (Neural Attention) Vision, is a visualization tool for the attention...

41
Emerging
27 Rishit-dagli/Nystromformer

An implementation of the Nyströmformer, using Nystrom method to approximate...

40
Emerging
28 tatp22/multidim-positional-encoding

An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow

40
Emerging
29 sara-nl/attention-sampling-pytorch

This is a PyTorch implementation of the paper: "Processing Megapixel Images...

40
Emerging
30 davidsvy/cosformer-pytorch

Unofficial PyTorch implementation of the paper "cosFormer: Rethinking...

40
Emerging
31 castorini/MP-CNN-Torch

Multi-Perspective Convolutional Neural Networks for modeling textual...

39
Emerging
32 soobinseo/Attentive-Neural-Process

A Pytorch Implementation of Attentive Neural Process

39
Emerging
33 pandeykartikey/Hierarchical-Attention-Network

Implementation of Hierarchical Attention Networks in PyTorch

38
Emerging
34 kyegomez/ShallowFF

Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward...

37
Emerging
35 GalacticExchange/pretrained

Pretrained is the most complete and frequently updated list of pretrained...

35
Emerging
36 Saquib764/omini-kontext

An inference and training framework for multiple image input in Flux Kontext dev

34
Emerging
37 esceptico/perceiver-io

Unofficial implementation of Perceiver IO

33
Emerging
38 SkBlaz/attviz

Dissecting Transformers via attention visualization

32
Emerging
39 billpsomas/efficient-probing

This repo contains the official implementation of the ICLR 2026 paper...

32
Emerging
40 tobna/TaylorShift

This repository contains the code for the paper "TaylorShift: Shifting the...

31
Emerging
41 Rishit-dagli/Compositional-Attention

An implementation of Compositional Attention: Disentangling Search and...

30
Emerging
42 Akrielz/vision_models_playground

Playground for testing and implementing various Vision Models

30
Emerging
43 kyegomez/Tree-Attention-Torch

An implementation of Tree-Attention in PyTorch because it's in JAX for some reason

30
Emerging
44 m-a-n-i-f-e-s-t/power-attention

Attention Kernels for Symmetric Power Transformers

30
Emerging
45 abcamiletto/mmit

A CV library in python, design and experiment with models using any encoder...

30
Emerging
46 sumo43/miniformer

Minimal Transformer re-implementation inspired by minGPT. Can be used as a...

29
Experimental
47 kyegomez/CT

Implementation of the attention and transformer from "Building Blocks for a...

28
Experimental
48 EricLBuehler/PerceiverIO-Classifier

A classifier based on PerceiverIO

28
Experimental
49 TiagoFilipeSousaGoncalves/survey-attention-medical-imaging

Implementation of the paper "A survey on attention mechanisms for medical...

27
Experimental
50 Rooooyy/HiTIN

Code for ACL 2023 paper "HiTIN: Hierarchy-aware Tree Isomorphism Network for...

27
Experimental
51 BobMcDear/attention-in-vision

PyTorch implementation of popular attention mechanisms in vision

24
Experimental
52 Lanerra/DWARF

O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet...

24
Experimental
53 MaitySubhajit/KArAt

Kolmogorov-Arnold Attention: Is Learnable Attention Better for Vision Transformers?

24
Experimental
54 ccfco/External-Attention-tensorflow

🍀 Tensorflow implementation of various Attention Mechanisms, MLP,...

23
Experimental
55 hrbigelow/transformer-aiayn

The Transformer from "Attention is All You Need"

23
Experimental
56 mzuhair9933/PoPE-pytorch

⚙️ Implement polar coordinate positional embedding in PyTorch for efficient...

22
Experimental
57 Mogalina/transformer

Minimal Transformer implementation in pure C based on the architecture from...

22
Experimental
58 IBM/DEFT

Official pytorch code for "From PEFT to DEFT: Parameter Efficient Finetuning...

22
Experimental
59 btrojan-official/HypeLoRA

HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model...

22
Experimental
60 ebrahimpichka/attn-PG-RL-tsp

A PyTorch implementation of the attention-based Policy Gradient RL for...

21
Experimental
61 AlphafromZion/lora-lab

LoRA Training Config Generator — optimal configs for SDXL, FLUX,...

21
Experimental
62 externalPointerVariable/AttentionIsAllYouNeed

Implementing Transformers from Scratch

20
Experimental
63 Iro96/Carbon

Carbon is a pure C++ Transformer framework inspired by GPT, featuring...

20
Experimental
64 biswajitsahoo1111/D2L_Attention_Mechanisms_in_TF

This repository contains Tensorflow 2 code for Attention Mechanisms chapter...

19
Experimental
65 ducnt2406/AI-Headshot

Easy-to-use toolkit for training LoRA models with SimpleTuner, featuring a...

18
Experimental
66 SCCSMARTCODE/attention-is-all-you-need-from-scratch

A complete implementation of the Transformer architecture from scratch,...

18
Experimental
67 ross-sec/fractal_attention_analysis

A mathematical framework for analyzing transformer attention mechanisms...

17
Experimental
68 pointlander/bento

An aware attention free simplified image transformer

17
Experimental
69 TiagoFilipeSousaGoncalves/attention-mechanisms-healthcare

Implementation of the paper "Preliminary Study on the Impact of Attention...

17
Experimental
70 wanga90/halonet-pytorch

About Implementation of the 😇 Attention layer from the paper, Scaling Local...

17
Experimental
71 sinpoce/ai-trainer-lite

🤖 3步训练你的专属AI模型 | 文本分类+图像分类+表格AutoML | Gradio可视化界面 | 无需GPU | 无需机器学习背景

15
Experimental
72 zhengqigao/hbsattn

a high-performance Block Sparse Attention kernel in Triton

14
Experimental
73 priyanshujiiii/awesome-Attention

Resources and references on solved and unsolved problems in attention mechanisms.

13
Experimental
74 elifsudeates/cnn-pooling-mekanizmalari

CNN Pooling, Convolution ve Attention mekanizmalarının interaktif Jupyter...

13
Experimental
75 nexus-4/self-attention-mechanism

Implementation of self-attention mechanism based on the "Attention is all...

13
Experimental
76 vijaysai1102/polyglot-neural-architecture

A multimodal deep learning project that integrates SQL, MongoDB, Graph, and...

13
Experimental
77 SadhuSoumik/AryanAI

A lightweight, cross-platform transformer model implementation written in...

12
Experimental
78 AttentionSeekers/CNNtention

Can CNNs do better with Attention?

12
Experimental
79 croko22/vit-cpp

An implementation of the Transformer model architecture ("Attention Is All...

11
Experimental
80 ivandustin/selfattention

Self-attention module in JAX

11
Experimental
81 homerjed/set_transformer

Implementation of a Set Transformer in JAX from the paper 'Set Transformer:...

11
Experimental
82 MalayAgr/fast-ats-pytorch

Implementation of "Processing Megapixel Images with Deep Attention-Sampling...

10
Experimental