Transformer Architecture Tutorials Transformer Models

Educational implementations and hands-on learning resources covering transformer fundamentals, attention mechanisms, and core architecture components. Does NOT include domain-specific applications (math solving, embeddings, RL), research papers on transformer theory, or production-grade models.

There are 313 transformer architecture tutorials models tracked. 1 score above 70 (verified tier). The highest-rated is lucidrains/x-transformers at 79/100 with 5,808 stars. 1 of the top 10 are actively maintained.

Get all 313 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-tutorials&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	lucidrains/x-transformers A concise but complete full-attention transformer with a set of promising...	79	Verified	5,808	Python
2	kanishkamisra/minicons Utility for behavioral and representational analyses of Language Models	67	Established	183	Python
3	lucidrains/simple-hierarchical-transformer Experiments around a simple idea for inducing multiple hierarchical...	59	Established	225	Python
4	lucidrains/dreamer4 Implementation of Danijar's latest iteration for his Dreamer line of work	59	Established	165	Python
5	Nicolepcx/Transformers-in-Action This is the corresponding code for the book Transformers in Action	53	Established	135	Jupyter Notebook
6	kyegomez/zeta Build high-performance AI models with modular building blocks	53	Established	579	Python
7	lucidrains/locoformer LocoFormer - Generalist Locomotion via Long-Context Adaptation	53	Established	102	Python
8	Rishit-dagli/Fast-Transformer An implementation of Additive Attention	51	Established	148	Jupyter Notebook
9	kyegomez/SwitchTransformers Implementation of Switch Transformers from the paper: "Switch Transformers:...	50	Established	136	Python
10	gordicaleksa/pytorch-original-transformer My implementation of the original transformer model (Vaswani et al.). I've...	50	Established	1,085	Jupyter Notebook
11	tomaarsen/attention_sinks Extend existing LLMs way beyond the original training length with constant...	49	Emerging	736	Python
12	dell-research-harvard/linktransformer A convenient way to link, deduplicate, aggregate and cluster data(frames) in...	49	Emerging	135	Python
13	HUSTAI/uie_pytorch PaddleNLP UIE模型的PyTorch版实现	49	Emerging	683	Python
14	helpmefindaname/transformer-smaller-training-vocab Temporary remove unused tokens during training to save ram and speed.	48	Emerging	23	Python
15	kyegomez/HLT Implementation of the transformer from the paper: "Real-World Humanoid...	48	Emerging	62	Python
16	tensorops/TransformerX Flexible Python library providing building blocks (layers) for reproducible...	48	Emerging	53	Python
17	The-AI-Summer/self-attention-cv Implementation of various self-attention mechanisms focused on computer...	47	Emerging	1,215	Python
18	cedrickchee/awesome-transformer-nlp A curated list of NLP resources focused on Transformer networks, attention...	47	Emerging	1,131	—
19	jiwidi/Behavior-Sequence-Transformer-Pytorch This is a pytorch implementation for the BST model from Alibaba...	47	Emerging	176	Jupyter Notebook
20	KRR-Oxford/HierarchyTransformers Language Models as Hierarchy Encoders	46	Emerging	40	Python
21	Rishit-dagli/Perceiver Implementation of Perceiver, General Perception with Iterative Attention	46	Emerging	87	Python
22	allenai/smashed SMASHED is a toolkit designed to apply transformations to samples in...	46	Emerging	35	Python
23	0x7o/RETRO-transformer Easy-to-use Retrieval-Enhanced Transformer implementation	45	Emerging	10	Python
24	Lightning-Universe/lightning-transformers Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning	45	Emerging	612	Python
25	marella/ctransformers Python bindings for the Transformer models implemented in C/C++ using GGML library.	45	Emerging	1,882	C
26	AlignmentResearch/tuned-lens Tools for understanding how transformer predictions are built layer-by-layer	45	Emerging	574	Python
27	sgrvinod/chess-transformers Teaching transformers to play chess	44	Emerging	151	Python
28	chengzeyi/ParaAttention https://wavespeed.ai/ Context parallel attention that accelerates DiT model...	44	Emerging	425	Python
29	google-research/long-range-arena Long Range Arena for Benchmarking Efficient Transformers	44	Emerging	783	Python
30	bhavsarpratik/easy-transformers Utility functions to work with transformers	44	Emerging	10	Python
31	Emmi-AI/noether Deep-learning framework for Engineering AI. Built on transformer building...	43	Emerging	131	Python
32	kyegomez/attn_res A clean, single-file PyTorch implementation of Attention Residuals (Kimi...	43	Emerging	8	Python
33	haoliuhl/ringattention Large Context Attention	43	Emerging	770	Python
34	lxuechen/private-transformers A codebase that makes differentially private training of transformers easy.	43	Emerging	185	Python
35	softmax1/Flash-Attention-Softmax-N CUDA and Triton implementations of Flash Attention with SoftmaxN.	42	Emerging	73	Python
36	Rishit-dagli/Conformer An implementation of Conformer: Convolution-augmented Transformer for Speech...	42	Emerging	45	Python
37	Beomi/InfiniTransformer Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No...	41	Emerging	375	Python
38	K-H-Ismail/torchortho [ICLR 2026] Polynomial, trigonometric, and tropical activations	41	Emerging	16	Jupyter Notebook
39	bodeby/torchstack 🫧 probability-level model ensembling for transformers	40	Emerging	3	Python
40	prajjwal1/fluence A deep learning library based on Pytorch focussed on low resource language...	40	Emerging	70	Python
41	jonrbates/turing A PyTorch library for simulating Turing machines with neural networks, based...	40	Emerging	2	Python
42	eduard23144/locoformer 🤖 Explore LocoFormer, a Transformer-XL model that enhances robot locomotion...	40	Emerging	4	Python
43	ziplab/LIT [AAAI 2022] This is the official PyTorch implementation of "Less is More:...	40	Emerging	97	Python
44	neulab/knn-transformers PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling...	40	Emerging	286	Python
45	dingo-actual/infini-transformer PyTorch implementation of Infini-Transformer from "Leave No Context Behind:...	40	Emerging	298	Python
46	cyk1337/Transformer-in-PyTorch Transformer/Transformer-XL/R-Transformer examples and explanations	39	Emerging	26	Python
47	clovaai/length-adaptive-transformer Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)	39	Emerging	102	Python
48	naokishibuya/simple_transformer A Transformer Implementation that is easy to understand and customizable.	39	Emerging	11	Python
49	kreasof-ai/OpenFormer A hackable library for running and fine-tuning modern transformer models on...	39	Emerging	28	Python
50	rafiepour/CTran Complete code for the proposed CNN-Transformer model for natural language...	39	Emerging	30	Jupyter Notebook
51	Geotrend-research/smaller-transformers Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.	39	Emerging	105	Jupyter Notebook
52	deep-div/Custom-Transformer-Pytorch A clean, ground-up implementation of the Transformer architecture in...	39	Emerging	16	Jupyter Notebook
53	knotgrass/attention several types of attention modules written in PyTorch for learning purposes	39	Emerging	53	Python
54	nihalsangeeth/behaviour-seq-transformer Pytorch implementation of "Behaviour Sequence Transformer for E-commerce...	39	Emerging	23	Python
55	chef-transformer/chef-transformer Chef Transformer 🍲 .	38	Emerging	85	Python
56	IvanBongiorni/maximal A TensorFlow-compatible Python library that provides models and layers to...	38	Emerging	9	Python
57	Kirill-Kravtsov/drophead-pytorch An implementation of drophead regularization for pytorch transformers	38	Emerging	19	Python
58	Gurumurthy30/Stackformer Modular PyTorch transformer library for building, training, and...	38	Emerging	7	Python
59	ccdv-ai/convert_checkpoint_to_lsg Efficient Attention for Long Sequence Processing	38	Emerging	98	Python
60	The-Swarm-Corporation/Hyena-Y A PyTorch implementation of the Hyena-Y model, a convolution-based...	38	Emerging	11	Python
61	mohyunho/NAS_transformer Evolutionary Neural Architecture Search on Transformers for RUL Prediction	37	Emerging	50	Python
62	iil-postech/semantic-attention Official implementation of "Attention-aware semantic communications for...	37	Emerging	13	Jupyter Notebook
63	mhw32/prototransformer-public PyTorch implementation for "ProtoTransformer: A Meta-Learning Approach to...	37	Emerging	16	Python
64	alexeykarnachev/full_stack_transformer Pytorch library for end-to-end transformer models training, inference and serving	37	Emerging	70	Python
65	Selozhd/FNet-tensorflow Tensorflow Implementation of "FNet: Mixing Tokens with Fourier Transforms."	36	Emerging	22	Python
66	jaketae/alibi PyTorch implementation of Train Short, Test Long: Attention with Linear...	36	Emerging	33	Python
67	antonyvigouret/Pay-Attention-to-MLPs My implementation of the gMLP model from the paper "Pay Attention to MLPs".	36	Emerging	25	Python
68	warner-benjamin/commented-transformers Highly commented implementations of Transformers in PyTorch	36	Emerging	138	Python
69	saeeddhqan/tiny-transformer Tiny transformer models implemented in pytorch.	36	Emerging	9	Python
70	cosbidev/NAIM Official implementation for the paper ``Not Another Imputation Method: A...	36	Emerging	11	Python
71	frankaging/ReCOGS ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of...	36	Emerging	10	Jupyter Notebook
72	arshadshk/SAINT-pytorch SAINT PyTorch implementation	35	Emerging	92	Python
73	Baran-phys/Tropical-Attention [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic...	35	Emerging	27	Python
74	fattorib/fusedswiglu Fused SwiGLU Triton kernels	35	Emerging	12	Python
75	tgautam03/Transformers A Gentle Introduction to Transformers Neural Network	35	Emerging	14	Jupyter Notebook
76	will-thompson-k/tldr-transformers The "tl;dr" on a few notable transformer papers (pre-2022).	34	Emerging	189	—
77	SakanaAI/evo-memory Code to train and evaluate Neural Attention Memory Models to obtain...	34	Emerging	352	Python
78	c00k1ez/plain-transformers Transformer models implementation for training from scratch.	34	Emerging	9	Python
79	BubbleJoe-BrownU/TransformerHub This is a repository of transformer-like models, including Transformer, GPT,...	34	Emerging	87	Python
80	AkiRusProd/numpy-transformer A numpy implementation of the Transformer model in "Attention is All You Need"	34	Emerging	58	Python
81	iKernels/transformers-lightning A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses...	34	Emerging	47	Python
82	hasanisaeed/C-Transformer Implementation of the core Transformer architecture in pure C	33	Emerging	8	C
83	mcbal/deep-implicit-attention Implementation of deep implicit attention in PyTorch	33	Emerging	65	Python
84	telekom/transformer-tools Transformers Training Tools	33	Emerging	6	Python
85	FareedKhan-dev/Understanding-Transformers-Step-by-Step-math-example Understanding Large Language Transformer Architecture like a child	32	Emerging	28	—
86	templetwo/PhaseGPT Kuramoto Phase-Coupled Oscillator Attention in Transformers	32	Emerging	9	Python
87	codyjk/ChessGPT ♟️ A transformer that plays chess 🤖	32	Emerging	6	Python
88	chris-santiago/met Reproducing the MET framework with PyTorch	32	Emerging	5	Python
89	fualsan/TransformerFromScratch PyTorch Implementation of Transformer Deep Learning Model	32	Emerging	2	Jupyter Notebook
90	RJain12/choformer Cho codon optimization WIP	32	Emerging	13	Jupyter Notebook
91	MurtyShikhar/TreeProjections Tool to measure tree-structuredness of the internal algorithm learnt by a...	32	Emerging	12	Python
92	xdevfaheem/Transformers A Comprehensive Implementation of Transformers Architecture from Scratch	32	Emerging	4	Python
93	arshadshk/Last_Query_Transformer_RNN-PyTorch Implementation of the paper "Last Query Transformer RNN for knowledge...	32	Emerging	44	Python
94	maxxxzdn/erwin Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical...	31	Emerging	112	Python
95	KhaledSharif/robot-transformers Train and evaluate an Action Chunking Transformer (ACT) to perform...	31	Emerging	17	Python
96	vmarinowski/infini-attention An unofficial pytorch implementation of 'Efficient Infinite Context...	31	Emerging	55	Python
97	crscardellino/argumentation-mining-transformers Argumentation Mining Transformers Module (AMTM) implementation.	31	Emerging	2	Python
98	kyegomez/Open-NAMM An open source implementation of the paper: "AN EVOLVED UNIVERSAL TRANSFORMER MEMORY"	31	Emerging	6	Python
99	ziansu/codeart Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention...	31	Emerging	18	Python
100	Agora-Lab-AI/HydraNet HydraNet is a state-of-the-art transformer architecture that combines...	31	Emerging	9	Shell
101	NiuTrans/Introduction-to-Transformers An introduction to basic concepts of Transformers and key techniques of...	31	Emerging	51	—
102	garyb9/pytorch-transformers Transformers architecture code playground repository in python using PyTorch.	31	Emerging	3	Python
103	mtanghu/LEAP LEAP: Linear Explainable Attention in Parallel for causal language modeling...	31	Emerging	4	Jupyter Notebook
104	bfilar/URLTran PyTorch/HuggingFace Implementation of URLTran: Improving Phishing URL...	30	Emerging	37	Python
105	mfekadu/nimbus-transformer it's like Nimbus but uses a transformer language model	30	Emerging	2	Python
106	jaketae/tupe PyTorch implementation of Rethinking Positional Encoding in Language Pre-training	30	Emerging	6	Python
107	davide-coccomini/TimeSformer-Video-Classification The notebook explains the various steps to obtain the results of...	30	Emerging	42	Jupyter Notebook
108	gmontamat/poor-mans-transformers Implement Transformers (and Deep Learning) from scratch in NumPy	30	Emerging	28	Python
109	rishabkr/Attention-Is-All-You-Need-Explained-PyTorch A paper implementation and tutorial from scratch combining various great...	30	Emerging	19	Jupyter Notebook
110	allenai/staged-training Staged Training for Transformer Language Models	29	Experimental	33	Jupyter Notebook
111	antofuller/configaformers A python library for highly configurable transformers - easing model...	29	Experimental	48	Python
112	mcbal/spin-model-transformers Physics-inspired transformer modules based on mean-field dynamics of...	29	Experimental	46	Python
113	kazuki-irie/kv-memory-brain Official Code Repository for the paper "Key-value memory in the brain"	29	Experimental	31	Jupyter Notebook
114	teddykoker/grokking PyTorch implementation of "Grokking: Generalization Beyond Overfitting on...	29	Experimental	39	Python
115	NTT123/sketch-transformer Modeling Draw, Quick! dataset using transformers	29	Experimental	7	Python
116	dpressel/mint MinT: Minimal Transformer Library and Tutorials	29	Experimental	261	Python
117	nullHawk/simple-transformer Implementation of Transformer model in PyTorch	28	Experimental	4	Python
118	rahul13ramesh/compositional_capabilities Compositional Capabilities of Autoregressive Transformers: A Study on...	28	Experimental	10	Python
119	ArneBinder/pytorch-ie-hydra-template-1 PyTorch-IE Hydra Template	28	Experimental	9	Python
120	osiriszjq/impulse_init Convolutional Initialization for Data-Efficient Vision Transformers	28	Experimental	16	Jupyter Notebook
121	Uokoroafor/transformer_from_scratch This is a PyTorch implementation of the Transformer model in the paper...	27	Experimental	1	Python
122	declare-lab/KNOT This repository contains the implementation of the paper -- KNOT: Knowledge...	27	Experimental	15	Python
123	erfanzar/OST-OpenSourceTransformers OST Collection: An AI-powered suite of models that predict the next word...	27	Experimental	16	Jupyter Notebook
124	ArtificialZeng/transformers-Explained 官方transformers源码解析。AI大模型时代，pytorch、transformer是新操作系统，其他都是运行在其上面的软件。	27	Experimental	16	Python
125	somosnlp/the-annotated-transformer Traducción al español del notebook "The Annotated Transformer" de Harvard...	27	Experimental	6	—
126	hmohebbi/ValueZeroing The official repo for the EACL 2023 paper "Quantifying Context Mixing in...	27	Experimental	12	Python
127	hrithickcodes/transformer-tf This repository contains the code for the paper "Attention Is All You Need"...	26	Experimental	9	Jupyter Notebook
128	ays-dev/keras-transformer Encoder-Decoder Transformer with cross-attention	26	Experimental	5	Python
129	trialandsuccess/verysimpletransformers Very Simple Transformers provides a simplified interface for packaging,...	26	Experimental	4	Python
130	milistu/outformer Clean Outputs from Language Models	26	Experimental	11	Python
131	Abhinand20/MathFormer MathFormer - Solve math equations using NLP and transformers!	25	Experimental	7	Python
132	Kareem404/hyper-connections A minimal implementation of Manifold-Constrained Hyper-Connections (mHC)...	25	Experimental	6	Python
133	kyegomez/Open-Olmo Unofficial open-source PyTorch implementation of the OLMo Hybrid...	25	Experimental	6	Python
134	osiriszjq/structured_init Structured Initialization for Attention in Vision Transformers	25	Experimental	4	Python
135	princeton-nlp/dyck-transformer [ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages	24	Experimental	13	Python
136	ansh-info/Titans-Learning-to-Memorize-at-Test-Time-with-Manim Visual animated walkthroughs of the DeepMind "Titans: Learning to Memorize...	24	Experimental	3	Python
137	Bradley-Butcher/Conformers Unofficial implementation of Conformal Language Modeling by Quach et al	24	Experimental	29	Python
138	ArpitKadam/Attention-Is-All-You-Code From Attention Mechanisms to Large Language Models — built from scratch.	24	Experimental	1	Jupyter Notebook
139	shreydan/scratchformers building various transformer model architectures and its modules from scratch.	24	Experimental	13	Jupyter Notebook
140	afspies/attention-tutorial Jupyter Notebook tutorial on Attention Mechanisms, Position Embeddings and...	24	Experimental	5	Jupyter Notebook
141	tech-srl/layer_norm_expressivity_role Code for the paper "On the Expressivity Role of LayerNorm in Transformers'...	23	Experimental	57	Python
142	danadascalescu00/ioai-transformer-workshop A hands-on introduction to Transformer architecture, designed for...	23	Experimental	2	Jupyter Notebook
143	Anne-Andresen/Multi-Modal-cuda-C-GAN Raw C/cuda implementation of 3d GAN	23	Experimental	3	Cuda
144	AMDonati/SMC-T-v2 Code for the paper "The Monte Carlo Transformer: a stochastic self-attention...	23	Experimental	2	Jupyter Notebook
145	Brokttv/Transformer-from-scratch elaborate transformer implementation + detailed explanation	23	Experimental	5	Python
146	NeuralCoder3/custom_infinite_craft A custom implementation of Infinite Craft (https://neal.fun/infinite-craft/)	23	Experimental	3	Python
147	homerjed/transformer_flows Implementation of Apple ML's Transformer Flow (or TARFlow) from "Normalising...	22	Experimental	6	Python
148	BoCtrl-C/attention-rollout Unofficial PyTorch implementation of Attention Rollout	22	Experimental	6	Jupyter Notebook
149	hazdzz/converter The official PyTorch implementation of Converter.	22	Experimental	7	Python
150	parham1998/Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling Official Pytorch Implementation of: "Enhancing High-Vocabulary Image...	22	Experimental	2	Python
151	mcbal/afem Implementation of approximate free-energy minimization in PyTorch	22	Experimental	21	Python
152	ArshockAbedan/Natural-Language-Processing-with-Attention-Models Attention Models in NLP	22	Experimental	2	Jupyter Notebook
153	dunktra/attention-binding-a11y Code for tracking concept emergence via attention-head binding (EB*). Pythia...	22	Experimental	1	Jupyter Notebook
154	hereandnowai/transformers-simplified Simplified, standalone Python scripts for transformer models, LLMs, TTS,...	22	Experimental	1	Python
155	shilongdai/ROT5 Small transformer trained from scratch	22	Experimental	2	Python
156	shubhexists/transformers basic implementation of transformers	22	Experimental	12	Python
157	mtingers/kompoz kompoz: Composable predicate and transform combinators with operator overloading	21	Experimental	—	Python
158	bikhanal/transformers The implementation of transformer as presented in the paper "Attention is...	21	Experimental	9	Python
159	pranoyr/attention-models Simplified Implementation of SOTA Deep Learning Papers in Pytorch	21	Experimental	4	Python
160	simboco/flash-linear-attention 💥 Optimize linear attention models with efficient Triton-based...	21	Experimental	—	Python
161	mingikang31/Fully-Convolutional-Transformers FCT: Fully Convolutional Transformers	21	Experimental	—	Python
162	mingikang31/Convolutional-Nearest-Neighbor-Attention Convolutional Nearest Neighbor Attention for Transformers	21	Experimental	—	Python
163	marcolacagnina/transformer-for-code-analysis PyTorch implementation of a Transformer Encoder to predict the Big O time...	21	Experimental	—	Jupyter Notebook
164	gheb02/chess-transformer This repository implements a KV Cache mechanism in autoregressive...	21	Experimental	—	Jupyter Notebook
165	Johnpaul10j/Transformers-with-keras Used the keras library to build a transformer using a sequence to sequence...	21	Experimental	—	Jupyter Notebook
166	jdmogollonp/tips-dpt-decoder Implementation of DeepMind TIPS DPT Decoder	21	Experimental	—	Jupyter Notebook
167	Gala2044/Transformers-for-absolute-dummies 🚀 Master transformers with this simple guide that breaks down complex...	21	Experimental	—	—
168	M-e-r-c-u-r-y/pytorch-transformers Collection of different types of transformers for learning purposes	21	Experimental	12	Jupyter Notebook
169	ozyurtf/attention-and-transformers The purpose of this project is to understand how the Transformers work and...	21	Experimental	1	Python
170	KeepALifeUS/ml-attention-mechanisms Flash Attention, RoPE, multi-head attention for temporal patterns	21	Experimental	—	Python
171	abc1203/transformer-model An implementation of the transformer deep learning model, based on the...	21	Experimental	1	PureBasic
172	Cobkgukgg/forgenn Modern neural networks in pure NumPy - Transformers, ResNet, and more	21	Experimental	—	Python
173	gmongaras/Cottention_Transformer Code for the paper "Cottention: Linear Transformers With Cosine Attention"	20	Experimental	20	Cuda
174	Lucasc-99/NoTorch A from-scratch neural network and transformers library, with speeds rivaling PyTorch	20	Experimental	10	Python
175	kyegomez/GATS Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in...	20	Experimental	8	Python
176	Vadimbuildercxx/looped_transformer Experimental implementation of "Looped Transformers are Better at Learning...	20	Experimental	8	Jupyter Notebook
177	kyegomez/AttnWithConvolutions Interleaved Attention's with convolutions for text modeling	20	Experimental	6	Python
178	snoop2head/Deep-Encoder-Shallow-Decoder 🤗 Huggingface Implementation of Kasai et al(2020) "Deep Encoder, Shallow...	20	Experimental	6	Python
179	frikishaan/pytorch-transformers This repository contains the original transformers model implementation code.	20	Experimental	1	Python
180	KOKOSde/sparse-clt Cross-Layer Transcoder (CLT) library for extracting sparse interpretable...	20	Experimental	1	Python
181	rajveer43/titan_transformer Unofficial implementation of titans transformer	20	Experimental	8	Jupyter Notebook
182	kyegomez/Mixture-of-MQA An implementation of a switch transformer like Multi-query attention model	20	Experimental	8	Python
183	HySonLab/HierAttention Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range...	20	Experimental	8	Python
184	harrisonvshen/triton-accelerated-attention Custom Triton GPU kernels for multi-head attention, including QK^T, softmax,...	20	Experimental	1	Python
185	yulang/phrasal-composition-in-transformers This repo contains datasets and code for Assessing Phrasal Representation...	20	Experimental	11	Python
186	NathanLeroux-git/OnlineTransformerWithSpikingNeurons This code is the implementation of the Spiking Online Transformer of the...	20	Experimental	8	Python
187	kyegomez/MultiQuerySuperpositionAttention Multi-Query Attention with Sub-linear Masking, Superposition, and Entanglement	19	Experimental	4	—
188	pelagecha/typ Associative Memory Augmentation for Long-Context Retrieval in Transformers	19	Experimental	—	Python
189	lorenzobalzani/nlp-dl-experiments Python implementation of Deep Learning models, with a focus on NLP.	19	Experimental	3	Jupyter Notebook
190	moskomule/simple_transformers Simple transformer implementations that I can understand	19	Experimental	20	Python
191	awadalaa/transact An unofficial implementation of "TransAct: Transformer-based Realtime User...	19	Experimental	3	Python
192	SergioArnaud/attention-is-all-you-need Implementation of a transformer following the Attention Is All You Need paper	19	Experimental	4	Python
193	agasheaditya/handson-transformers End-to-end implementation of Transformers using PyTorch from scratch	19	Experimental	3	Jupyter Notebook
194	VinkuraAI/AXEN-M AXEN-M (Attention eXtended Efficient Network - Model) is a powerful...	19	Experimental	3	Python
195	Omikrone/Mnemos Mnemos is a mini-LLM based on Transformers, designed for training and...	19	Experimental	2	Python
196	tzhengtek/saute SAUTE is a lightweight transformer-based architecture adapted for dialog modeling	19	Experimental	2	Python
197	zzmtsvv/ad-gta Grouped-Tied Attention by Zadouri, Strauss, Dao (2025).	19	Experimental	2	Python
198	kikirizki/transformer Minimalistic PyTorch implementation of transformer	18	Experimental	2	Jupyter Notebook
199	pedrocurvo/HAET HAET: Hierarchical Attention Erwin Transolver is a hybrid neural...	18	Experimental	1	Python
200	R2D2-08/turmachpy A python package for simulating a variety of Turing machines.	18	Experimental	1	Python
201	CESOIA/transformer-surgeon Transformer models library with compression options	18	Experimental	1	Python
202	Jourdelune/Transformer My implementation of the transformer architecture from the paper "Attention...	18	Experimental	2	Python
203	BramVanroy/lt3-2019-transformer-trainer Transformer trainer for variety of classification problems that has been...	18	Experimental	2	Python
204	dariush-bahrami/mytransformers My implementation of transformers	18	Experimental	2	Python
205	Dhyanam04/ByteFetcher This is ByteFetcher	18	Experimental	2	Python
206	ariva00/GaussianAttention4Matching Code for the models described in the paper Localized Gaussians as...	18	Experimental	2	Python
207	maxime7770/Transformers-Insights Exploring how Transformers actually transform the data under the hood	18	Experimental	2	Python
208	graphcore-research/flash-attention-ipu Poplar implementation of FlashAttention for IPU	18	Experimental	2	C++
209	hunterhammond-dev/attention-mechanisms-in-transformers Learn and visualize attention mechanisms in transformer models — inspired by...	18	Experimental	1	Python
210	Carnetemperrado/x-transformers-rl x-transformers-rl is a work-in-progress implementation of a transformer for...	18	Experimental	1	Python
211	Sid7on1/Transformer-256dim A powerful Transformer architecture built from scratch by Prajwal for...	18	Experimental	1	Python
212	gustavecortal/transformer Slides from my NLP course on the transformer architecture	18	Experimental	2	—
213	ander-db/Transformers-PytorchLightning 👋 This is my implementation of the Transformer architecture from scratch...	18	Experimental	2	Python
214	ytgui/SPT-proto This repo includes a Sparse Transformer implementation which utilizes PQ to...	18	Experimental	2	Python
215	kyegomez/open-text-embedding-ada-002 This repository presents a production-grade implementation of a...	18	Experimental	2	Python
216	lmxx1234567/goofy-hydra Goofy Hydra is a Transport Layer Link Aggregator based on Transformer	18	Experimental	2	Python
217	tegridydev/hydraform Self-Evolving Python Transformer Research	18	Experimental	1	Python
218	Mozeel-V/nebula-mini Minimal PyTorch-based Nebula pipeline replica for malware behavior modeling	17	Experimental	—	Python
219	Prakhar-Bhartiya/Transformers_From_Scratch A walkthrough that builds a Transformer from first principles inside Jupyter...	17	Experimental	—	Jupyter Notebook
220	NipunRathore/NLP-Transformers-from-Scratch Pre-training a Transformer from scratch.	17	Experimental	1	Jupyter Notebook
221	pplkit/AllYouNeedIsAttention An efficient and robust implementation of the seminal "Attention Is All You...	17	Experimental	1	Python
222	hash-ir/transformer-lab Hands-on implementation of transformer and related models	17	Experimental	1	Python
223	girishdhegde/NLP Implementation of Deep Learning based Language Models from scratch in PyTorch	17	Experimental	1	Python
224	Jayluci4/micro-attention Attention mechanism in ~50 lines - understand transformers by building from scratch	17	Experimental	—	Jupyter Notebook
225	Ipvikukiepki-KQS/progressive-transformers A neural network architecture for building conversational agents	17	Experimental	1	Python
226	devrahulbanjara/Transformers-from-Scratch A repository implementing Transformers from scratch using PyTorch, designed...	17	Experimental	1	Jupyter Notebook
227	shahrukhx01/transformers-bisected A repo containing all building blocks of transformer model for text...	17	Experimental	1	Python
228	thiomajid/distil_xlstm Learning Attention Mechanisms through Recurrent Structures	17	Experimental	—	Jupyter Notebook
229	PeterJemley/Continuous-Depth-Transformers-with-Learned-Control-Dynamics Hybrid transformer architecture replacing discrete layers with Neural ODE...	17	Experimental	2	Jupyter Notebook
230	ghubnerr/attention-mechanisms A compilation of most State-of-the-Art Attention Mechanisms: MHSA, MQA, GQA,...	17	Experimental	3	Python
231	JHansiduYapa/Transformer-Model-from-Scratch Build a Transformer model from scratch using Pytorch, implementing key...	17	Experimental	1	Jupyter Notebook
232	pavlosdais/Transformers-Linear-Algebra Transformer Based Learning of Fundamental Linear Algebra Operations	17	Experimental	—	Python
233	tom-effernelli/small-LLM Implementing the 'Attention is all you need' paper through a simple LLM model	17	Experimental	—	Python
234	microcoder-py/attn-is-all-you-need A TFX implementation of the paper on transformers, Attention is All You Need	17	Experimental	1	Python
235	KOKOSde/sparse-transcoder PyPI package for optimized sparse feature extraction from transformer...	17	Experimental	—	Python
236	fatou1526/Pytorch_Transformers This repo contains codes concerning pytorch models from how to define the...	17	Experimental	1	Python
237	AlperYildirim1/Attention-is-All-You-Need-Pytorch A fully reproducible, high-performance PyTorch Colab implementation of the...	16	Experimental	3	Jupyter Notebook
238	Sarhamam/ZetaFormer Curriculum learning framework that uses geometrically structured datasets...	16	Experimental	1	Python
239	viktor-shcherb/qk-sniffer Capture sampled Q/K attention vectors from HF transformers into per-branch...	16	Experimental	1	Python
240	SyedAkramaIrshad/transformer-grokking-lab Tiny Transformer grokking experiment with live notebook visualizations.	15	Experimental	2	Jupyter Notebook
241	nsarrazin/chessformer Experiments in chess & transformers	14	Experimental	6	Jupyter Notebook
242	viktor-shcherb/qk-pca-analysis PCA analysis of Q/K attention vectors to discover position-correlated...	14	Experimental	1	Python
243	DzmitryPihulski/Encoder-transformer-from-scratch Fully functional encoder transformer from tokenizer to lm-head	14	Experimental	1	Python
244	macespinoza/mini-transformer-didactico Implementación didáctica de un Transformer Encoder–Decoder basada en...	14	Experimental	3	Jupyter Notebook
245	Datta0/nanoformer A small repo to experiment with Transformer (and more) architectures.	14	Experimental	5	Python
246	kazuki-irie/hybrid-memory Official repository for the paper "Blending Complementary Memory Systems in...	14	Experimental	13	Python
247	dlukeh/transformer-deep-dive A deep descent into the neural abyss — understanding transformers through...	14	Experimental	—	—
248	arvind207kumar/Time-Cross-Adaptive-Self-Attention-TCSA-based-Imputation-model- Time-Cross Adaptive Self-Attention (TCSA) model for multivariate Time...	14	Experimental	1	Jupyter Notebook
249	robflynnyh/hydra-linear-attention Implementation of: Hydra Attention: Efficient Attention with Many Heads...	13	Experimental	14	Python
250	m15kh/Transformer_From_Scratch_Pytorch Implementation of Transformer from scratch in PyTorch, covering full...	13	Experimental	3	Python
251	Chamiln17/Transformer-From-Scratch My implmentation of the transformer architecture described in the paper...	13	Experimental	—	Python
252	hasnainyaqub/TRANSFORMERS Transformers are deep learning architectures that use self-attention instead...	13	Experimental	2	Jupyter Notebook
253	isakovaad/fedcsis25 A machine learning project to predict chess puzzle difficulty ratings using...	13	Experimental	—	Jupyter Notebook
254	balamarimuthu/deep-learning-with-pytorch This repository contains a minimal PyTorch-based Transformer model...	13	Experimental	—	Jupyter Notebook
255	adityakamat24/triton-fast-mha A high-performance kernel implementation of multi-head attention using...	13	Experimental	—	Python
256	Joe-Naz01/transformers A deep learning project that implements and explains the fundamental...	13	Experimental	—	Jupyter Notebook
257	samaraxmmar/transformer-explained A hands-on guide to understanding and building Transformer models from...	13	Experimental	—	Jupyter Notebook
258	kanenorman/grassmann Attempt at reproducing "Attention Is Not What You Need: Grassmann Flows as...	13	Experimental	—	Python
259	Ranjit2111/Transformer-NMT A PyTorch implementation of the Transformer architecture from "Attention Is...	13	Experimental	—	Python
260	albertkjoller/transformer-redundancy Code for the paper "How Redundant Is the Transformer Stack in Speech...	13	Experimental	—	Python
261	richengguy/calc.ai Transformer-based Calculator	13	Experimental	—	Python
262	chaowei312/HyperGraph-Sparse-Attention Sparse attention via hypergraph partitioning for efficient long-context transformers	13	Experimental	—	Python
263	sathishkumar67/Byte-Latent-Transformer Implementation of Byte Latent Transformer	13	Experimental	—	Python
264	benearnthof/SparseTransformers Reproducing the Paper Generating Long Sequences with Sparse Transformers by...	13	Experimental	—	Python
265	MrHenstep/NN_Self_Learn Neural network architectures from perceptrons to GPT, built and trained from scratch	13	Experimental	—	Python
266	Projects-Developer/Transformer-Models-For-NLP-Applications Includes Source Code, PPT, Synopsis, Report, Documents, Base Research Paper...	13	Experimental	—	—
267	dsindex/transformers_examples reference pytorch code for huggingface transformers	13	Experimental	9	Shell
268	santiag0m/traveling-words Code repository for the paper "Traveling Words: A Geometric Interpretation...	13	Experimental	9	Python
269	rashi-bhansali/encoder-decoder-transformer-variants-from-scratch PyTorch implementation of Transformer encoder and GPT-style decoder with...	13	Experimental	—	Python
270	SimonOuellette35/CountingWithTransformers Code for paper "Counting and Algorithmic Generalization with Transformers"	12	Experimental	7	Python
271	1AyaNabil1/attention_is_all_you_need A clean, well-documented PyTorch implementation of the Transformer	12	Experimental	1	Python
272	laa-1/machine-translation 一个基于 PyTorch 框架构建 Transformers 模型并应用于翻译任务的项目，其中附带了详细的文档介绍 Transformers...	12	Experimental	8	Python
273	FromZeroToFanatic/Thoroughly_Understanding_Transformer 纯实战：手搓“Transformer”	12	Experimental	3	Python
274	MarsJacobs/ti-kd-qat [EACL 2023 main] This Repository provides a Pytorch implementation of...	12	Experimental	6	Python
275	pier-maker92/pytorch-lightning-Transformer Pytorch implementation of Transformer wrapped with Pytorch Lightning	12	Experimental	5	Jupyter Notebook
276	Taaniya/Transformers-architecture This repository contains codes and Jupyter notebooks exploring Transformers...	12	Experimental	2	Jupyter Notebook
277	3xcaffeine/language-model-scratchbook implementation of modern transformer-based language models from scratch	12	Experimental	1	Python
278	gmlwns2000/sttabt [ICLR2023] Official code of Sparse Token Transformer with Attention Back-Tracking	12	Experimental	8	Jupyter Notebook
279	conorhassan/AR-TabPFN Efficient autoregressive inference for TabPFN models	12	Experimental	1	Python
280	ajitashwath/attention-is-all-you-need A practical implementation of Transformer	11	Experimental	2	Python
281	tailuge/experiments ChessGPT experiments	11	Experimental	—	Jupyter Notebook
282	vraun0/Transformer Implementation of the paper Attention Is All You Need (2017) in Pytorch,...	11	Experimental	2	Python
283	TapasKumarDutta1/Transformer-pytorch This repository hosts a collection of cutting-edge transformer-based...	11	Experimental	—	Python
284	Srikar-V675/langgpt Re-implementation of the paper "Attention Is All You Need" for language translation	11	Experimental	—	Jupyter Notebook
285	Ronnypetson/MagnusFormer Generation of human-like chess games with deep language models.	11	Experimental	—	Jupyter Notebook
286	plae-tljg/Transformer-Implementation-C-Python hand written code for transformer in c, no acceleration	11	Experimental	1	C
287	vinhtran2611/transformers A PyTorch implementation of the Transformer model in "Attention is All You Need".	11	Experimental	3	Jupyter Notebook
288	TristanThorn/seq2seq-transformers-pytorch A basic seq2seq transformers model trained and validated on the Multi30k dataset.	11	Experimental	—	Python
289	AbdelrahmanShahrour/Transformers-from-scratch scratch	11	Experimental	—	Python
290	bPavan16/nmt Implementation of Transformers from scratch using pytorch for language...	11	Experimental	—	Python
291	inseokson/transformers-from-scratch Implementation of various transformer-based models from scratch	11	Experimental	—	Python
292	msclock/transformersplus Add Some plus extra features to transformers	11	Experimental	—	Python
293	thomas-corcoran/recipetransformer Utilities to generate recipes using transformers	11	Experimental	—	Python
294	isaprykin/transformers-sota Simple from-scratch implementations of transformer-based models that match...	11	Experimental	—	Python
295	dakofler/compyute_transformer Developing the transformer modules and functions for Compyute	11	Experimental	—	Jupyter Notebook
296	abideenml/TransformerImplementationfromScratch My implementation of the "Attention is all you Need" 📝 Transformer model Ⓜ️...	11	Experimental	—	Jupyter Notebook
297	gshashank84/Transformers Implementation of Transformers	11	Experimental	—	Jupyter Notebook
298	satani99/tinyformers A concise but fully-featured transformer, complete with a set of promising...	11	Experimental	—	Python
299	yulang/fine-tuning-and-composition-in-transformers This repo contains datasets and code for On the Interplay Between...	11	Experimental	4	Python
300	santiag0m/hopfield-networks This repository contains simple implementations of the family of Hopfield...	11	Experimental	3	Python
301	avramdj/transformers-in-pytorch various popular transformer architectures	11	Experimental	3	Python
302	petroniocandido/st_nca Neural Cellular Automata For Large Scale Spatio-Temporal Forecasting	11	Experimental	1	Jupyter Notebook
303	malerbe/Encoders_Explained Understand the transformer architecture by learning about encoders with...	10	Experimental	1	Jupyter Notebook
304	malojan/executive_climate_change_attention Repository for the construction of the Executive Climate Change Attention Indicator	10	Experimental	1	Jupyter Notebook
305	eryawww/Gymformer Gymformer is a PyTorch framework for training Transformer agents in...	10	Experimental	1	Python
306	mrglaster/transformers-normal-maps-converter Convert the normal maps used in the game Transformers: Fall of Cybertron to...	10	Experimental	2	Python
307	im-knots/byte-latent-transformer An implementation of Meta's Byte Latent Transformer architecture	10	Experimental	1	Python
308	Factral/winter-attention notes about attention and transfomers	10	Experimental	2	Jupyter Notebook
309	ehtisham-sadiq/Attention-Mechanisms-From-Theory-to-Implementation A comprehensive exploration of attention mechanisms, from theoretical...	10	Experimental	2	Jupyter Notebook
310	dmt-zh/Transformers-Full-Review Total review of Transformer's architecture by example of OpenNMT-tf framework	10	Experimental	2	—
311	gszfwsb/Unveiling-Induction-Heads PyTorch implementation for "Unveiling Induction Heads: Provable Training...	10	Experimental	2	Jupyter Notebook
312	DjangoUncoded/Transformers This repository contains a clean and modular implementation of a Transformer...	10	Experimental	1	Jupyter Notebook
313	godhunter98/nano_transformers From scratch implementation of a small transformers language model inspired...	10	Experimental	1	Jupyter Notebook

Comparisons in this category

x-transformers and self-attention-cv (79 vs 47) x-transformers and simple-hierarchical-transformer (79 vs 59) x-transformers and Fast-Transformer (79 vs 51) x-transformers and TransformerX (79 vs 48)