Attention Mechanism Implementations ML Frameworks

Implementations and tutorials of attention layers, attention mechanisms, and self-attention architectures for neural networks. Does NOT include broader transformer architectures, vision models, or applications that use attention as a component without focusing on the mechanism itself.

There are 82 attention mechanism implementations frameworks tracked. 5 score above 50 (established tier). The highest-rated is philipperemy/keras-attention at 61/100 with 2,815 stars.

Get all 82 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=attention-mechanism-implementations&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	philipperemy/keras-attention Keras Attention Layer (Luong and Bahdanau scores).	61	Established	2,815	Python
2	tatp22/linformer-pytorch My take on a practical implementation of Linformer for Pytorch.	51	Established	422	Python
3	datalogue/keras-attention Visualizing RNNs using the attention mechanism	51	Established	750	Python
4	ematvey/hierarchical-attention-networks Document classification with Hierarchical Attention Networks in TensorFlow....	51	Established	467	Python
5	thushv89/attention_keras Keras Layer implementation of Attention for Sequential models	51	Established	444	Python
6	davidmascharka/tbd-nets PyTorch implementation of "Transparency by Design: Closing the Gap Between...	49	Emerging	345	Jupyter Notebook
7	soskek/attention_is_all_you_need Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.	49	Emerging	323	Jupyter Notebook
8	lucidrains/fast-weight-attention Implementation of Fast Weight Attention	48	Emerging	22	Python
9	balavenkatesh3322/CV-pretrained-model A collection of computer vision pre-trained models.	48	Emerging	1,361	—
10	brandokoch/attention-is-all-you-need-paper Original transformer paper: Implementation of Vaswani, Ashish, et al....	48	Emerging	243	Jupyter Notebook
11	willGuimont/learnable_fourier_positional_encoding Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding	48	Emerging	55	Python
12	kushalj001/pytorch-question-answering Important paper implementations for Question Answering using PyTorch	47	Emerging	269	Jupyter Notebook
13	tlatkowski/multihead-siamese-nets Implementation of Siamese Neural Networks built upon multihead attention...	47	Emerging	183	Jupyter Notebook
14	kyegomez/FlashMHA An simple pytorch implementation of Flash MultiHead Attention	45	Emerging	22	Jupyter Notebook
15	tensorflow/similarity TensorFlow Similarity is a python package focused on making similarity...	45	Emerging	1,024	Python
16	Ugenteraan/Deep_Hierarchical_Classification PyTorch Implementation of Deep Hierarchical Classification for Category...	44	Emerging	99	Python
17	rockerBOO/lora-inspector LoRA (Low-Rank Adaptation) inspector for Stable Diffusion	44	Emerging	103	Python
18	lsdefine/attention-is-all-you-need-keras A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need	43	Emerging	715	Python
19	Zhenye-Na/DA-RNN 📃 𝖀𝖓𝖔𝖋𝖋𝖎𝖈𝖎𝖆𝖑 PyTorch Implementation of DA-RNN (arXiv:1704.02971)	43	Emerging	424	Jupyter Notebook
20	macournoyer/neuralconvo Neural conversational model in Torch	43	Emerging	775	Lua
21	opengeos/earthformer A Python package for Earth forecasting transformer	43	Emerging	91	Python
22	EdGENetworks/attention-networks-for-classification Hierarchical Attention Networks for Document Classification in PyTorch	43	Emerging	608	Jupyter Notebook
23	szagoruyko/attention-transfer Improving Convolutional Networks via Attention Transfer (ICLR 2017)	42	Emerging	1,466	Jupyter Notebook
24	poloclub/dodrio Exploring attention weights in transformer-based models with linguistic knowledge.	42	Emerging	370	Svelte
25	rentainhe/visualization a collection of visualization function	42	Emerging	449	Python
26	cbaziotis/neat-vision Neat (Neural Attention) Vision, is a visualization tool for the attention...	41	Emerging	251	Vue
27	Rishit-dagli/Nystromformer An implementation of the Nyströmformer, using Nystrom method to approximate...	40	Emerging	58	Python
28	tatp22/multidim-positional-encoding An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow	40	Emerging	615	Python
29	sara-nl/attention-sampling-pytorch This is a PyTorch implementation of the paper: "Processing Megapixel Images...	40	Emerging	41	Python
30	davidsvy/cosformer-pytorch Unofficial PyTorch implementation of the paper "cosFormer: Rethinking...	40	Emerging	44	Jupyter Notebook
31	castorini/MP-CNN-Torch Multi-Perspective Convolutional Neural Networks for modeling textual...	39	Emerging	105	Lua
32	soobinseo/Attentive-Neural-Process A Pytorch Implementation of Attentive Neural Process	39	Emerging	74	Jupyter Notebook
33	pandeykartikey/Hierarchical-Attention-Network Implementation of Hierarchical Attention Networks in PyTorch	38	Emerging	129	Jupyter Notebook
34	kyegomez/ShallowFF Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward...	37	Emerging	12	Python
35	GalacticExchange/pretrained Pretrained is the most complete and frequently updated list of pretrained...	35	Emerging	129	—
36	Saquib764/omini-kontext An inference and training framework for multiple image input in Flux Kontext dev	34	Emerging	438	Jupyter Notebook
37	esceptico/perceiver-io Unofficial implementation of Perceiver IO	33	Emerging	128	Python
38	SkBlaz/attviz Dissecting Transformers via attention visualization	32	Emerging	5	JavaScript
39	billpsomas/efficient-probing This repo contains the official implementation of the ICLR 2026 paper...	32	Emerging	29	Python
40	tobna/TaylorShift This repository contains the code for the paper "TaylorShift: Shifting the...	31	Emerging	13	Python
41	Rishit-dagli/Compositional-Attention An implementation of Compositional Attention: Disentangling Search and...	30	Emerging	14	Python
42	Akrielz/vision_models_playground Playground for testing and implementing various Vision Models	30	Emerging	13	Jupyter Notebook
43	kyegomez/Tree-Attention-Torch An implementation of Tree-Attention in PyTorch because it's in JAX for some reason	30	Emerging	5	Python
44	m-a-n-i-f-e-s-t/power-attention Attention Kernels for Symmetric Power Transformers	30	Emerging	129	—
45	abcamiletto/mmit A CV library in python, design and experiment with models using any encoder...	30	Emerging	14	Python
46	sumo43/miniformer Minimal Transformer re-implementation inspired by minGPT. Can be used as a...	29	Experimental	1	Python
47	kyegomez/CT Implementation of the attention and transformer from "Building Blocks for a...	28	Experimental	8	Python
48	EricLBuehler/PerceiverIO-Classifier A classifier based on PerceiverIO	28	Experimental	8	Jupyter Notebook
49	TiagoFilipeSousaGoncalves/survey-attention-medical-imaging Implementation of the paper "A survey on attention mechanisms for medical...	27	Experimental	13	Python
50	Rooooyy/HiTIN Code for ACL 2023 paper "HiTIN: Hierarchy-aware Tree Isomorphism Network for...	27	Experimental	38	Python
51	BobMcDear/attention-in-vision PyTorch implementation of popular attention mechanisms in vision	24	Experimental	19	Python
52	Lanerra/DWARF O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet...	24	Experimental	3	Python
53	MaitySubhajit/KArAt Kolmogorov-Arnold Attention: Is Learnable Attention Better for Vision Transformers?	24	Experimental	15	Python
54	ccfco/External-Attention-tensorflow 🍀 Tensorflow implementation of various Attention Mechanisms, MLP,...	23	Experimental	41	Python
55	hrbigelow/transformer-aiayn The Transformer from "Attention is All You Need"	23	Experimental	2	Python
56	mzuhair9933/PoPE-pytorch ⚙️ Implement polar coordinate positional embedding in PyTorch for efficient...	22	Experimental	—	Python
57	Mogalina/transformer Minimal Transformer implementation in pure C based on the architecture from...	22	Experimental	—	C
58	IBM/DEFT Official pytorch code for "From PEFT to DEFT: Parameter Efficient Finetuning...	22	Experimental	7	Python
59	btrojan-official/HypeLoRA HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model...	22	Experimental	12	Python
60	ebrahimpichka/attn-PG-RL-tsp A PyTorch implementation of the attention-based Policy Gradient RL for...	21	Experimental	9	Jupyter Notebook
61	AlphafromZion/lora-lab LoRA Training Config Generator — optimal configs for SDXL, FLUX,...	21	Experimental	—	HTML
62	externalPointerVariable/AttentionIsAllYouNeed Implementing Transformers from Scratch	20	Experimental	2	Jupyter Notebook
63	Iro96/Carbon Carbon is a pure C++ Transformer framework inspired by GPT, featuring...	20	Experimental	1	C++
64	biswajitsahoo1111/D2L_Attention_Mechanisms_in_TF This repository contains Tensorflow 2 code for Attention Mechanisms chapter...	19	Experimental	13	Jupyter Notebook
65	ducnt2406/AI-Headshot Easy-to-use toolkit for training LoRA models with SimpleTuner, featuring a...	18	Experimental	2	Python
66	SCCSMARTCODE/attention-is-all-you-need-from-scratch A complete implementation of the Transformer architecture from scratch,...	18	Experimental	2	Jupyter Notebook
67	ross-sec/fractal_attention_analysis A mathematical framework for analyzing transformer attention mechanisms...	17	Experimental	—	Python
68	pointlander/bento An aware attention free simplified image transformer	17	Experimental	1	Go
69	TiagoFilipeSousaGoncalves/attention-mechanisms-healthcare Implementation of the paper "Preliminary Study on the Impact of Attention...	17	Experimental	1	Python
70	wanga90/halonet-pytorch About Implementation of the 😇 Attention layer from the paper, Scaling Local...	17	Experimental	1	Python
71	sinpoce/ai-trainer-lite 🤖 3步训练你的专属AI模型 \| 文本分类+图像分类+表格AutoML \| Gradio可视化界面 \| 无需GPU \| 无需机器学习背景	15	Experimental	1	Python
72	zhengqigao/hbsattn a high-performance Block Sparse Attention kernel in Triton	14	Experimental	3	Python
73	priyanshujiiii/awesome-Attention Resources and references on solved and unsolved problems in attention mechanisms.	13	Experimental	—	—
74	elifsudeates/cnn-pooling-mekanizmalari CNN Pooling, Convolution ve Attention mekanizmalarının interaktif Jupyter...	13	Experimental	2	Jupyter Notebook
75	nexus-4/self-attention-mechanism Implementation of self-attention mechanism based on the "Attention is all...	13	Experimental	—	Python
76	vijaysai1102/polyglot-neural-architecture A multimodal deep learning project that integrates SQL, MongoDB, Graph, and...	13	Experimental	—	Python
77	SadhuSoumik/AryanAI A lightweight, cross-platform transformer model implementation written in...	12	Experimental	3	C
78	AttentionSeekers/CNNtention Can CNNs do better with Attention?	12	Experimental	5	Jupyter Notebook
79	croko22/vit-cpp An implementation of the Transformer model architecture ("Attention Is All...	11	Experimental	2	C++
80	ivandustin/selfattention Self-attention module in JAX	11	Experimental	—	—
81	homerjed/set_transformer Implementation of a Set Transformer in JAX from the paper 'Set Transformer:...	11	Experimental	4	Jupyter Notebook
82	MalayAgr/fast-ats-pytorch Implementation of "Processing Megapixel Images with Deep Attention-Sampling...	10	Experimental	2	Cuda

Comparisons in this category

keras-attention and attention_keras (61 vs 51) keras-attention and attention_keras (51 vs 51) hierarchical-attention-networks and attention-networks-for-classification (51 vs 43) attention_is_all_you_need and attention-is-all-you-need-paper (49 vs 48)