Transformer Architecture Tutorials Transformer Models

Educational implementations and hands-on learning resources covering transformer fundamentals, attention mechanisms, and core architecture components. Does NOT include domain-specific applications (math solving, embeddings, RL), research papers on transformer theory, or production-grade models.

There are 313 transformer architecture tutorials models tracked. 1 score above 70 (verified tier). The highest-rated is lucidrains/x-transformers at 79/100 with 5,808 stars. 1 of the top 10 are actively maintained.

Get all 313 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-tutorials&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising...

79
Verified
2 kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

67
Established
3 lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical...

59
Established
4 lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

59
Established
5 Nicolepcx/Transformers-in-Action

This is the corresponding code for the book Transformers in Action

53
Established
6 kyegomez/zeta

Build high-performance AI models with modular building blocks

53
Established
7 lucidrains/locoformer

LocoFormer - Generalist Locomotion via Long-Context Adaptation

53
Established
8 Rishit-dagli/Fast-Transformer

An implementation of Additive Attention

51
Established
9 kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers:...

50
Established
10 gordicaleksa/pytorch-original-transformer

My implementation of the original transformer model (Vaswani et al.). I've...

50
Established
11 tomaarsen/attention_sinks

Extend existing LLMs way beyond the original training length with constant...

49
Emerging
12 dell-research-harvard/linktransformer

A convenient way to link, deduplicate, aggregate and cluster data(frames) in...

49
Emerging
13 HUSTAI/uie_pytorch

PaddleNLP UIE模型的PyTorch版实现

49
Emerging
14 helpmefindaname/transformer-smaller-training-vocab

Temporary remove unused tokens during training to save ram and speed.

48
Emerging
15 kyegomez/HLT

Implementation of the transformer from the paper: "Real-World Humanoid...

48
Emerging
16 tensorops/TransformerX

Flexible Python library providing building blocks (layers) for reproducible...

48
Emerging
17 The-AI-Summer/self-attention-cv

Implementation of various self-attention mechanisms focused on computer...

47
Emerging
18 cedrickchee/awesome-transformer-nlp

A curated list of NLP resources focused on Transformer networks, attention...

47
Emerging
19 jiwidi/Behavior-Sequence-Transformer-Pytorch

This is a pytorch implementation for the BST model from Alibaba...

47
Emerging
20 KRR-Oxford/HierarchyTransformers

Language Models as Hierarchy Encoders

46
Emerging
21 Rishit-dagli/Perceiver

Implementation of Perceiver, General Perception with Iterative Attention

46
Emerging
22 allenai/smashed

SMASHED is a toolkit designed to apply transformations to samples in...

46
Emerging
23 0x7o/RETRO-transformer

Easy-to-use Retrieval-Enhanced Transformer implementation

45
Emerging
24 Lightning-Universe/lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning

45
Emerging
25 marella/ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

45
Emerging
26 AlignmentResearch/tuned-lens

Tools for understanding how transformer predictions are built layer-by-layer

45
Emerging
27 sgrvinod/chess-transformers

Teaching transformers to play chess

44
Emerging
28 chengzeyi/ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model...

44
Emerging
29 google-research/long-range-arena

Long Range Arena for Benchmarking Efficient Transformers

44
Emerging
30 bhavsarpratik/easy-transformers

Utility functions to work with transformers

44
Emerging
31 Emmi-AI/noether

Deep-learning framework for Engineering AI. Built on transformer building...

43
Emerging
32 kyegomez/attn_res

A clean, single-file PyTorch implementation of Attention Residuals (Kimi...

43
Emerging
33 haoliuhl/ringattention

Large Context Attention

43
Emerging
34 lxuechen/private-transformers

A codebase that makes differentially private training of transformers easy.

43
Emerging
35 softmax1/Flash-Attention-Softmax-N

CUDA and Triton implementations of Flash Attention with SoftmaxN.

42
Emerging
36 Rishit-dagli/Conformer

An implementation of Conformer: Convolution-augmented Transformer for Speech...

42
Emerging
37 Beomi/InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No...

41
Emerging
38 K-H-Ismail/torchortho

[ICLR 2026] Polynomial, trigonometric, and tropical activations

41
Emerging
39 bodeby/torchstack

🫧 probability-level model ensembling for transformers

40
Emerging
40 prajjwal1/fluence

A deep learning library based on Pytorch focussed on low resource language...

40
Emerging
41 jonrbates/turing

A PyTorch library for simulating Turing machines with neural networks, based...

40
Emerging
42 eduard23144/locoformer

🤖 Explore LocoFormer, a Transformer-XL model that enhances robot locomotion...

40
Emerging
43 ziplab/LIT

[AAAI 2022] This is the official PyTorch implementation of "Less is More:...

40
Emerging
44 neulab/knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling...

40
Emerging
45 dingo-actual/infini-transformer

PyTorch implementation of Infini-Transformer from "Leave No Context Behind:...

40
Emerging
46 cyk1337/Transformer-in-PyTorch

Transformer/Transformer-XL/R-Transformer examples and explanations

39
Emerging
47 clovaai/length-adaptive-transformer

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

39
Emerging
48 naokishibuya/simple_transformer

A Transformer Implementation that is easy to understand and customizable.

39
Emerging
49 kreasof-ai/OpenFormer

A hackable library for running and fine-tuning modern transformer models on...

39
Emerging
50 rafiepour/CTran

Complete code for the proposed CNN-Transformer model for natural language...

39
Emerging
51 Geotrend-research/smaller-transformers

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

39
Emerging
52 deep-div/Custom-Transformer-Pytorch

A clean, ground-up implementation of the Transformer architecture in...

39
Emerging
53 knotgrass/attention

several types of attention modules written in PyTorch for learning purposes

39
Emerging
54 nihalsangeeth/behaviour-seq-transformer

Pytorch implementation of "Behaviour Sequence Transformer for E-commerce...

39
Emerging
55 chef-transformer/chef-transformer

Chef Transformer 🍲 .

38
Emerging
56 IvanBongiorni/maximal

A TensorFlow-compatible Python library that provides models and layers to...

38
Emerging
57 Kirill-Kravtsov/drophead-pytorch

An implementation of drophead regularization for pytorch transformers

38
Emerging
58 Gurumurthy30/Stackformer

Modular PyTorch transformer library for building, training, and...

38
Emerging
59 ccdv-ai/convert_checkpoint_to_lsg

Efficient Attention for Long Sequence Processing

38
Emerging
60 The-Swarm-Corporation/Hyena-Y

A PyTorch implementation of the Hyena-Y model, a convolution-based...

38
Emerging
61 mohyunho/NAS_transformer

Evolutionary Neural Architecture Search on Transformers for RUL Prediction

37
Emerging
62 iil-postech/semantic-attention

Official implementation of "Attention-aware semantic communications for...

37
Emerging
63 mhw32/prototransformer-public

PyTorch implementation for "ProtoTransformer: A Meta-Learning Approach to...

37
Emerging
64 alexeykarnachev/full_stack_transformer

Pytorch library for end-to-end transformer models training, inference and serving

37
Emerging
65 Selozhd/FNet-tensorflow

Tensorflow Implementation of "FNet: Mixing Tokens with Fourier Transforms."

36
Emerging
66 jaketae/alibi

PyTorch implementation of Train Short, Test Long: Attention with Linear...

36
Emerging
67 antonyvigouret/Pay-Attention-to-MLPs

My implementation of the gMLP model from the paper "Pay Attention to MLPs".

36
Emerging
68 warner-benjamin/commented-transformers

Highly commented implementations of Transformers in PyTorch

36
Emerging
69 saeeddhqan/tiny-transformer

Tiny transformer models implemented in pytorch.

36
Emerging
70 cosbidev/NAIM

Official implementation for the paper ``Not Another Imputation Method: A...

36
Emerging
71 frankaging/ReCOGS

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of...

36
Emerging
72 arshadshk/SAINT-pytorch

SAINT PyTorch implementation

35
Emerging
73 Baran-phys/Tropical-Attention

[NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic...

35
Emerging
74 fattorib/fusedswiglu

Fused SwiGLU Triton kernels

35
Emerging
75 tgautam03/Transformers

A Gentle Introduction to Transformers Neural Network

35
Emerging
76 will-thompson-k/tldr-transformers

The "tl;dr" on a few notable transformer papers (pre-2022).

34
Emerging
77 SakanaAI/evo-memory

Code to train and evaluate Neural Attention Memory Models to obtain...

34
Emerging
78 c00k1ez/plain-transformers

Transformer models implementation for training from scratch.

34
Emerging
79 BubbleJoe-BrownU/TransformerHub

This is a repository of transformer-like models, including Transformer, GPT,...

34
Emerging
80 AkiRusProd/numpy-transformer

A numpy implementation of the Transformer model in "Attention is All You Need"

34
Emerging
81 iKernels/transformers-lightning

A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses...

34
Emerging
82 hasanisaeed/C-Transformer

Implementation of the core Transformer architecture in pure C

33
Emerging
83 mcbal/deep-implicit-attention

Implementation of deep implicit attention in PyTorch

33
Emerging
84 telekom/transformer-tools

Transformers Training Tools

33
Emerging
85 FareedKhan-dev/Understanding-Transformers-Step-by-Step-math-example

Understanding Large Language Transformer Architecture like a child

32
Emerging
86 templetwo/PhaseGPT

Kuramoto Phase-Coupled Oscillator Attention in Transformers

32
Emerging
87 codyjk/ChessGPT

♟️ A transformer that plays chess 🤖

32
Emerging
88 chris-santiago/met

Reproducing the MET framework with PyTorch

32
Emerging
89 fualsan/TransformerFromScratch

PyTorch Implementation of Transformer Deep Learning Model

32
Emerging
90 RJain12/choformer

Cho codon optimization WIP

32
Emerging
91 MurtyShikhar/TreeProjections

Tool to measure tree-structuredness of the internal algorithm learnt by a...

32
Emerging
92 xdevfaheem/Transformers

A Comprehensive Implementation of Transformers Architecture from Scratch

32
Emerging
93 arshadshk/Last_Query_Transformer_RNN-PyTorch

Implementation of the paper "Last Query Transformer RNN for knowledge...

32
Emerging
94 maxxxzdn/erwin

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical...

31
Emerging
95 KhaledSharif/robot-transformers

Train and evaluate an Action Chunking Transformer (ACT) to perform...

31
Emerging
96 vmarinowski/infini-attention

An unofficial pytorch implementation of 'Efficient Infinite Context...

31
Emerging
97 crscardellino/argumentation-mining-transformers

Argumentation Mining Transformers Module (AMTM) implementation.

31
Emerging
98 kyegomez/Open-NAMM

An open source implementation of the paper: "AN EVOLVED UNIVERSAL TRANSFORMER MEMORY"

31
Emerging
99 ziansu/codeart

Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention...

31
Emerging
100 Agora-Lab-AI/HydraNet

HydraNet is a state-of-the-art transformer architecture that combines...

31
Emerging
101 NiuTrans/Introduction-to-Transformers

An introduction to basic concepts of Transformers and key techniques of...

31
Emerging
102 garyb9/pytorch-transformers

Transformers architecture code playground repository in python using PyTorch.

31
Emerging
103 mtanghu/LEAP

LEAP: Linear Explainable Attention in Parallel for causal language modeling...

31
Emerging
104 bfilar/URLTran

PyTorch/HuggingFace Implementation of URLTran: Improving Phishing URL...

30
Emerging
105 mfekadu/nimbus-transformer

it's like Nimbus but uses a transformer language model

30
Emerging
106 jaketae/tupe

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

30
Emerging
107 davide-coccomini/TimeSformer-Video-Classification

The notebook explains the various steps to obtain the results of...

30
Emerging
108 gmontamat/poor-mans-transformers

Implement Transformers (and Deep Learning) from scratch in NumPy

30
Emerging
109 rishabkr/Attention-Is-All-You-Need-Explained-PyTorch

A paper implementation and tutorial from scratch combining various great...

30
Emerging
110 allenai/staged-training

Staged Training for Transformer Language Models

29
Experimental
111 antofuller/configaformers

A python library for highly configurable transformers - easing model...

29
Experimental
112 mcbal/spin-model-transformers

Physics-inspired transformer modules based on mean-field dynamics of...

29
Experimental
113 kazuki-irie/kv-memory-brain

Official Code Repository for the paper "Key-value memory in the brain"

29
Experimental
114 teddykoker/grokking

PyTorch implementation of "Grokking: Generalization Beyond Overfitting on...

29
Experimental
115 NTT123/sketch-transformer

Modeling Draw, Quick! dataset using transformers

29
Experimental
116 dpressel/mint

MinT: Minimal Transformer Library and Tutorials

29
Experimental
117 nullHawk/simple-transformer

Implementation of Transformer model in PyTorch

28
Experimental
118 rahul13ramesh/compositional_capabilities

Compositional Capabilities of Autoregressive Transformers: A Study on...

28
Experimental
119 ArneBinder/pytorch-ie-hydra-template-1

PyTorch-IE Hydra Template

28
Experimental
120 osiriszjq/impulse_init

Convolutional Initialization for Data-Efficient Vision Transformers

28
Experimental
121 Uokoroafor/transformer_from_scratch

This is a PyTorch implementation of the Transformer model in the paper...

27
Experimental
122 declare-lab/KNOT

This repository contains the implementation of the paper -- KNOT: Knowledge...

27
Experimental
123 erfanzar/OST-OpenSourceTransformers

OST Collection: An AI-powered suite of models that predict the next word...

27
Experimental
124 ArtificialZeng/transformers-Explained

官方transformers源码解析。AI大模型时代,pytorch、transformer是新操作系统,其他都是运行在其上面的软件。

27
Experimental
125 somosnlp/the-annotated-transformer

Traducción al español del notebook "The Annotated Transformer" de Harvard...

27
Experimental
126 hmohebbi/ValueZeroing

The official repo for the EACL 2023 paper "Quantifying Context Mixing in...

27
Experimental
127 hrithickcodes/transformer-tf

This repository contains the code for the paper "Attention Is All You Need"...

26
Experimental
128 ays-dev/keras-transformer

Encoder-Decoder Transformer with cross-attention

26
Experimental
129 trialandsuccess/verysimpletransformers

Very Simple Transformers provides a simplified interface for packaging,...

26
Experimental
130 milistu/outformer

Clean Outputs from Language Models

26
Experimental
131 Abhinand20/MathFormer

MathFormer - Solve math equations using NLP and transformers!

25
Experimental
132 Kareem404/hyper-connections

A minimal implementation of Manifold-Constrained Hyper-Connections (mHC)...

25
Experimental
133 kyegomez/Open-Olmo

Unofficial open-source PyTorch implementation of the OLMo Hybrid...

25
Experimental
134 osiriszjq/structured_init

Structured Initialization for Attention in Vision Transformers

25
Experimental
135 princeton-nlp/dyck-transformer

[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages

24
Experimental
136 ansh-info/Titans-Learning-to-Memorize-at-Test-Time-with-Manim

Visual animated walkthroughs of the DeepMind "Titans: Learning to Memorize...

24
Experimental
137 Bradley-Butcher/Conformers

Unofficial implementation of Conformal Language Modeling by Quach et al

24
Experimental
138 ArpitKadam/Attention-Is-All-You-Code

From Attention Mechanisms to Large Language Models — built from scratch.

24
Experimental
139 shreydan/scratchformers

building various transformer model architectures and its modules from scratch.

24
Experimental
140 afspies/attention-tutorial

Jupyter Notebook tutorial on Attention Mechanisms, Position Embeddings and...

24
Experimental
141 tech-srl/layer_norm_expressivity_role

Code for the paper "On the Expressivity Role of LayerNorm in Transformers'...

23
Experimental
142 danadascalescu00/ioai-transformer-workshop

A hands-on introduction to Transformer architecture, designed for...

23
Experimental
143 Anne-Andresen/Multi-Modal-cuda-C-GAN

Raw C/cuda implementation of 3d GAN

23
Experimental
144 AMDonati/SMC-T-v2

Code for the paper "The Monte Carlo Transformer: a stochastic self-attention...

23
Experimental
145 Brokttv/Transformer-from-scratch

elaborate transformer implementation + detailed explanation

23
Experimental
146 NeuralCoder3/custom_infinite_craft

A custom implementation of Infinite Craft (https://neal.fun/infinite-craft/)

23
Experimental
147 homerjed/transformer_flows

Implementation of Apple ML's Transformer Flow (or TARFlow) from "Normalising...

22
Experimental
148 BoCtrl-C/attention-rollout

Unofficial PyTorch implementation of Attention Rollout

22
Experimental
149 hazdzz/converter

The official PyTorch implementation of Converter.

22
Experimental
150 parham1998/Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Official Pytorch Implementation of: "Enhancing High-Vocabulary Image...

22
Experimental
151 mcbal/afem

Implementation of approximate free-energy minimization in PyTorch

22
Experimental
152 ArshockAbedan/Natural-Language-Processing-with-Attention-Models

Attention Models in NLP

22
Experimental
153 dunktra/attention-binding-a11y

Code for tracking concept emergence via attention-head binding (EB*). Pythia...

22
Experimental
154 hereandnowai/transformers-simplified

Simplified, standalone Python scripts for transformer models, LLMs, TTS,...

22
Experimental
155 shilongdai/ROT5

Small transformer trained from scratch

22
Experimental
156 shubhexists/transformers

basic implementation of transformers

22
Experimental
157 mtingers/kompoz

kompoz: Composable predicate and transform combinators with operator overloading

21
Experimental
158 bikhanal/transformers

The implementation of transformer as presented in the paper "Attention is...

21
Experimental
159 pranoyr/attention-models

Simplified Implementation of SOTA Deep Learning Papers in Pytorch

21
Experimental
160 simboco/flash-linear-attention

💥 Optimize linear attention models with efficient Triton-based...

21
Experimental
161 mingikang31/Fully-Convolutional-Transformers

FCT: Fully Convolutional Transformers

21
Experimental
162 mingikang31/Convolutional-Nearest-Neighbor-Attention

Convolutional Nearest Neighbor Attention for Transformers

21
Experimental
163 marcolacagnina/transformer-for-code-analysis

PyTorch implementation of a Transformer Encoder to predict the Big O time...

21
Experimental
164 gheb02/chess-transformer

This repository implements a KV Cache mechanism in autoregressive...

21
Experimental
165 Johnpaul10j/Transformers-with-keras

Used the keras library to build a transformer using a sequence to sequence...

21
Experimental
166 jdmogollonp/tips-dpt-decoder

Implementation of DeepMind TIPS DPT Decoder

21
Experimental
167 Gala2044/Transformers-for-absolute-dummies

🚀 Master transformers with this simple guide that breaks down complex...

21
Experimental
168 M-e-r-c-u-r-y/pytorch-transformers

Collection of different types of transformers for learning purposes

21
Experimental
169 ozyurtf/attention-and-transformers

The purpose of this project is to understand how the Transformers work and...

21
Experimental
170 KeepALifeUS/ml-attention-mechanisms

Flash Attention, RoPE, multi-head attention for temporal patterns

21
Experimental
171 abc1203/transformer-model

An implementation of the transformer deep learning model, based on the...

21
Experimental
172 Cobkgukgg/forgenn

Modern neural networks in pure NumPy - Transformers, ResNet, and more

21
Experimental
173 gmongaras/Cottention_Transformer

Code for the paper "Cottention: Linear Transformers With Cosine Attention"

20
Experimental
174 Lucasc-99/NoTorch

A from-scratch neural network and transformers library, with speeds rivaling PyTorch

20
Experimental
175 kyegomez/GATS

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in...

20
Experimental
176 Vadimbuildercxx/looped_transformer

Experimental implementation of "Looped Transformers are Better at Learning...

20
Experimental
177 kyegomez/AttnWithConvolutions

Interleaved Attention's with convolutions for text modeling

20
Experimental
178 snoop2head/Deep-Encoder-Shallow-Decoder

🤗 Huggingface Implementation of Kasai et al(2020) "Deep Encoder, Shallow...

20
Experimental
179 frikishaan/pytorch-transformers

This repository contains the original transformers model implementation code.

20
Experimental
180 KOKOSde/sparse-clt

Cross-Layer Transcoder (CLT) library for extracting sparse interpretable...

20
Experimental
181 rajveer43/titan_transformer

Unofficial implementation of titans transformer

20
Experimental
182 kyegomez/Mixture-of-MQA

An implementation of a switch transformer like Multi-query attention model

20
Experimental
183 HySonLab/HierAttention

Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range...

20
Experimental
184 harrisonvshen/triton-accelerated-attention

Custom Triton GPU kernels for multi-head attention, including QK^T, softmax,...

20
Experimental
185 yulang/phrasal-composition-in-transformers

This repo contains datasets and code for Assessing Phrasal Representation...

20
Experimental
186 NathanLeroux-git/OnlineTransformerWithSpikingNeurons

This code is the implementation of the Spiking Online Transformer of the...

20
Experimental
187 kyegomez/MultiQuerySuperpositionAttention

Multi-Query Attention with Sub-linear Masking, Superposition, and Entanglement

19
Experimental
188 pelagecha/typ

Associative Memory Augmentation for Long-Context Retrieval in Transformers

19
Experimental
189 lorenzobalzani/nlp-dl-experiments

Python implementation of Deep Learning models, with a focus on NLP.

19
Experimental
190 moskomule/simple_transformers

Simple transformer implementations that I can understand

19
Experimental
191 awadalaa/transact

An unofficial implementation of "TransAct: Transformer-based Realtime User...

19
Experimental
192 SergioArnaud/attention-is-all-you-need

Implementation of a transformer following the Attention Is All You Need paper

19
Experimental
193 agasheaditya/handson-transformers

End-to-end implementation of Transformers using PyTorch from scratch

19
Experimental
194 VinkuraAI/AXEN-M

AXEN-M (Attention eXtended Efficient Network - Model) is a powerful...

19
Experimental
195 Omikrone/Mnemos

Mnemos is a mini-LLM based on Transformers, designed for training and...

19
Experimental
196 tzhengtek/saute

SAUTE is a lightweight transformer-based architecture adapted for dialog modeling

19
Experimental
197 zzmtsvv/ad-gta

Grouped-Tied Attention by Zadouri, Strauss, Dao (2025).

19
Experimental
198 kikirizki/transformer

Minimalistic PyTorch implementation of transformer

18
Experimental
199 pedrocurvo/HAET

HAET: Hierarchical Attention Erwin Transolver is a hybrid neural...

18
Experimental
200 R2D2-08/turmachpy

A python package for simulating a variety of Turing machines.

18
Experimental
201 CESOIA/transformer-surgeon

Transformer models library with compression options

18
Experimental
202 Jourdelune/Transformer

My implementation of the transformer architecture from the paper "Attention...

18
Experimental
203 BramVanroy/lt3-2019-transformer-trainer

Transformer trainer for variety of classification problems that has been...

18
Experimental
204 dariush-bahrami/mytransformers

My implementation of transformers

18
Experimental
205 Dhyanam04/ByteFetcher

This is ByteFetcher

18
Experimental
206 ariva00/GaussianAttention4Matching

Code for the models described in the paper Localized Gaussians as...

18
Experimental
207 maxime7770/Transformers-Insights

Exploring how Transformers actually transform the data under the hood

18
Experimental
208 graphcore-research/flash-attention-ipu

Poplar implementation of FlashAttention for IPU

18
Experimental
209 hunterhammond-dev/attention-mechanisms-in-transformers

Learn and visualize attention mechanisms in transformer models — inspired by...

18
Experimental
210 Carnetemperrado/x-transformers-rl

x-transformers-rl is a work-in-progress implementation of a transformer for...

18
Experimental
211 Sid7on1/Transformer-256dim

A powerful Transformer architecture built from scratch by Prajwal for...

18
Experimental
212 gustavecortal/transformer

Slides from my NLP course on the transformer architecture

18
Experimental
213 ander-db/Transformers-PytorchLightning

👋 This is my implementation of the Transformer architecture from scratch...

18
Experimental
214 ytgui/SPT-proto

This repo includes a Sparse Transformer implementation which utilizes PQ to...

18
Experimental
215 kyegomez/open-text-embedding-ada-002

This repository presents a production-grade implementation of a...

18
Experimental
216 lmxx1234567/goofy-hydra

Goofy Hydra is a Transport Layer Link Aggregator based on Transformer

18
Experimental
217 tegridydev/hydraform

Self-Evolving Python Transformer Research

18
Experimental
218 Mozeel-V/nebula-mini

Minimal PyTorch-based Nebula pipeline replica for malware behavior modeling

17
Experimental
219 Prakhar-Bhartiya/Transformers_From_Scratch

A walkthrough that builds a Transformer from first principles inside Jupyter...

17
Experimental
220 NipunRathore/NLP-Transformers-from-Scratch

Pre-training a Transformer from scratch.

17
Experimental
221 pplkit/AllYouNeedIsAttention

An efficient and robust implementation of the seminal "Attention Is All You...

17
Experimental
222 hash-ir/transformer-lab

Hands-on implementation of transformer and related models

17
Experimental
223 girishdhegde/NLP

Implementation of Deep Learning based Language Models from scratch in PyTorch

17
Experimental
224 Jayluci4/micro-attention

Attention mechanism in ~50 lines - understand transformers by building from scratch

17
Experimental
225 Ipvikukiepki-KQS/progressive-transformers

A neural network architecture for building conversational agents

17
Experimental
226 devrahulbanjara/Transformers-from-Scratch

A repository implementing Transformers from scratch using PyTorch, designed...

17
Experimental
227 shahrukhx01/transformers-bisected

A repo containing all building blocks of transformer model for text...

17
Experimental
228 thiomajid/distil_xlstm

Learning Attention Mechanisms through Recurrent Structures

17
Experimental
229 PeterJemley/Continuous-Depth-Transformers-with-Learned-Control-Dynamics

Hybrid transformer architecture replacing discrete layers with Neural ODE...

17
Experimental
230 ghubnerr/attention-mechanisms

A compilation of most State-of-the-Art Attention Mechanisms: MHSA, MQA, GQA,...

17
Experimental
231 JHansiduYapa/Transformer-Model-from-Scratch

Build a Transformer model from scratch using Pytorch, implementing key...

17
Experimental
232 pavlosdais/Transformers-Linear-Algebra

Transformer Based Learning of Fundamental Linear Algebra Operations

17
Experimental
233 tom-effernelli/small-LLM

Implementing the 'Attention is all you need' paper through a simple LLM model

17
Experimental
234 microcoder-py/attn-is-all-you-need

A TFX implementation of the paper on transformers, Attention is All You Need

17
Experimental
235 KOKOSde/sparse-transcoder

PyPI package for optimized sparse feature extraction from transformer...

17
Experimental
236 fatou1526/Pytorch_Transformers

This repo contains codes concerning pytorch models from how to define the...

17
Experimental
237 AlperYildirim1/Attention-is-All-You-Need-Pytorch

A fully reproducible, high-performance PyTorch Colab implementation of the...

16
Experimental
238 Sarhamam/ZetaFormer

Curriculum learning framework that uses geometrically structured datasets...

16
Experimental
239 viktor-shcherb/qk-sniffer

Capture sampled Q/K attention vectors from HF transformers into per-branch...

16
Experimental
240 SyedAkramaIrshad/transformer-grokking-lab

Tiny Transformer grokking experiment with live notebook visualizations.

15
Experimental
241 nsarrazin/chessformer

Experiments in chess & transformers

14
Experimental
242 viktor-shcherb/qk-pca-analysis

PCA analysis of Q/K attention vectors to discover position-correlated...

14
Experimental
243 DzmitryPihulski/Encoder-transformer-from-scratch

Fully functional encoder transformer from tokenizer to lm-head

14
Experimental
244 macespinoza/mini-transformer-didactico

Implementación didáctica de un Transformer Encoder–Decoder basada en...

14
Experimental
245 Datta0/nanoformer

A small repo to experiment with Transformer (and more) architectures.

14
Experimental
246 kazuki-irie/hybrid-memory

Official repository for the paper "Blending Complementary Memory Systems in...

14
Experimental
247 dlukeh/transformer-deep-dive

A deep descent into the neural abyss — understanding transformers through...

14
Experimental
248 arvind207kumar/Time-Cross-Adaptive-Self-Attention-TCSA-based-Imputation-model-

Time-Cross Adaptive Self-Attention (TCSA) model for multivariate Time...

14
Experimental
249 robflynnyh/hydra-linear-attention

Implementation of: Hydra Attention: Efficient Attention with Many Heads...

13
Experimental
250 m15kh/Transformer_From_Scratch_Pytorch

Implementation of Transformer from scratch in PyTorch, covering full...

13
Experimental
251 Chamiln17/Transformer-From-Scratch

My implmentation of the transformer architecture described in the paper...

13
Experimental
252 hasnainyaqub/TRANSFORMERS

Transformers are deep learning architectures that use self-attention instead...

13
Experimental
253 isakovaad/fedcsis25

A machine learning project to predict chess puzzle difficulty ratings using...

13
Experimental
254 balamarimuthu/deep-learning-with-pytorch

This repository contains a minimal PyTorch-based Transformer model...

13
Experimental
255 adityakamat24/triton-fast-mha

A high-performance kernel implementation of multi-head attention using...

13
Experimental
256 Joe-Naz01/transformers

A deep learning project that implements and explains the fundamental...

13
Experimental
257 samaraxmmar/transformer-explained

A hands-on guide to understanding and building Transformer models from...

13
Experimental
258 kanenorman/grassmann

Attempt at reproducing "Attention Is Not What You Need: Grassmann Flows as...

13
Experimental
259 Ranjit2111/Transformer-NMT

A PyTorch implementation of the Transformer architecture from "Attention Is...

13
Experimental
260 albertkjoller/transformer-redundancy

Code for the paper "How Redundant Is the Transformer Stack in Speech...

13
Experimental
261 richengguy/calc.ai

Transformer-based Calculator

13
Experimental
262 chaowei312/HyperGraph-Sparse-Attention

Sparse attention via hypergraph partitioning for efficient long-context transformers

13
Experimental
263 sathishkumar67/Byte-Latent-Transformer

Implementation of Byte Latent Transformer

13
Experimental
264 benearnthof/SparseTransformers

Reproducing the Paper Generating Long Sequences with Sparse Transformers by...

13
Experimental
265 MrHenstep/NN_Self_Learn

Neural network architectures from perceptrons to GPT, built and trained from scratch

13
Experimental
266 Projects-Developer/Transformer-Models-For-NLP-Applications

Includes Source Code, PPT, Synopsis, Report, Documents, Base Research Paper...

13
Experimental
267 dsindex/transformers_examples

reference pytorch code for huggingface transformers

13
Experimental
268 santiag0m/traveling-words

Code repository for the paper "Traveling Words: A Geometric Interpretation...

13
Experimental
269 rashi-bhansali/encoder-decoder-transformer-variants-from-scratch

PyTorch implementation of Transformer encoder and GPT-style decoder with...

13
Experimental
270 SimonOuellette35/CountingWithTransformers

Code for paper "Counting and Algorithmic Generalization with Transformers"

12
Experimental
271 1AyaNabil1/attention_is_all_you_need

A clean, well-documented PyTorch implementation of the Transformer

12
Experimental
272 laa-1/machine-translation

一个基于 PyTorch 框架构建 Transformers 模型并应用于翻译任务的项目,其中附带了详细的文档介绍 Transformers...

12
Experimental
273 FromZeroToFanatic/Thoroughly_Understanding_Transformer

纯实战:手搓“Transformer”

12
Experimental
274 MarsJacobs/ti-kd-qat

[EACL 2023 main] This Repository provides a Pytorch implementation of...

12
Experimental
275 pier-maker92/pytorch-lightning-Transformer

Pytorch implementation of Transformer wrapped with Pytorch Lightning

12
Experimental
276 Taaniya/Transformers-architecture

This repository contains codes and Jupyter notebooks exploring Transformers...

12
Experimental
277 3xcaffeine/language-model-scratchbook

implementation of modern transformer-based language models from scratch

12
Experimental
278 gmlwns2000/sttabt

[ICLR2023] Official code of Sparse Token Transformer with Attention Back-Tracking

12
Experimental
279 conorhassan/AR-TabPFN

Efficient autoregressive inference for TabPFN models

12
Experimental
280 ajitashwath/attention-is-all-you-need

A practical implementation of Transformer

11
Experimental
281 tailuge/experiments

ChessGPT experiments

11
Experimental
282 vraun0/Transformer

Implementation of the paper Attention Is All You Need (2017) in Pytorch,...

11
Experimental
283 TapasKumarDutta1/Transformer-pytorch

This repository hosts a collection of cutting-edge transformer-based...

11
Experimental
284 Srikar-V675/langgpt

Re-implementation of the paper "Attention Is All You Need" for language translation

11
Experimental
285 Ronnypetson/MagnusFormer

Generation of human-like chess games with deep language models.

11
Experimental
286 plae-tljg/Transformer-Implementation-C-Python

hand written code for transformer in c, no acceleration

11
Experimental
287 vinhtran2611/transformers

A PyTorch implementation of the Transformer model in "Attention is All You Need".

11
Experimental
288 TristanThorn/seq2seq-transformers-pytorch

A basic seq2seq transformers model trained and validated on the Multi30k dataset.

11
Experimental
289 AbdelrahmanShahrour/Transformers-from-scratch

scratch

11
Experimental
290 bPavan16/nmt

Implementation of Transformers from scratch using pytorch for language...

11
Experimental
291 inseokson/transformers-from-scratch

Implementation of various transformer-based models from scratch

11
Experimental
292 msclock/transformersplus

Add Some plus extra features to transformers

11
Experimental
293 thomas-corcoran/recipetransformer

Utilities to generate recipes using transformers

11
Experimental
294 isaprykin/transformers-sota

Simple from-scratch implementations of transformer-based models that match...

11
Experimental
295 dakofler/compyute_transformer

Developing the transformer modules and functions for Compyute

11
Experimental
296 abideenml/TransformerImplementationfromScratch

My implementation of the "Attention is all you Need" 📝 Transformer model Ⓜ️...

11
Experimental
297 gshashank84/Transformers

Implementation of Transformers

11
Experimental
298 satani99/tinyformers

A concise but fully-featured transformer, complete with a set of promising...

11
Experimental
299 yulang/fine-tuning-and-composition-in-transformers

This repo contains datasets and code for On the Interplay Between...

11
Experimental
300 santiag0m/hopfield-networks

This repository contains simple implementations of the family of Hopfield...

11
Experimental
301 avramdj/transformers-in-pytorch

various popular transformer architectures

11
Experimental
302 petroniocandido/st_nca

Neural Cellular Automata For Large Scale Spatio-Temporal Forecasting

11
Experimental
303 malerbe/Encoders_Explained

Understand the transformer architecture by learning about encoders with...

10
Experimental
304 malojan/executive_climate_change_attention

Repository for the construction of the Executive Climate Change Attention Indicator

10
Experimental
305 eryawww/Gymformer

Gymformer is a PyTorch framework for training Transformer agents in...

10
Experimental
306 mrglaster/transformers-normal-maps-converter

Convert the normal maps used in the game Transformers: Fall of Cybertron to...

10
Experimental
307 im-knots/byte-latent-transformer

An implementation of Meta's Byte Latent Transformer architecture

10
Experimental
308 Factral/winter-attention

notes about attention and transfomers

10
Experimental
309 ehtisham-sadiq/Attention-Mechanisms-From-Theory-to-Implementation

A comprehensive exploration of attention mechanisms, from theoretical...

10
Experimental
310 dmt-zh/Transformers-Full-Review

Total review of Transformer's architecture by example of OpenNMT-tf framework

10
Experimental
311 gszfwsb/Unveiling-Induction-Heads

PyTorch implementation for "Unveiling Induction Heads: Provable Training...

10
Experimental
312 DjangoUncoded/Transformers

This repository contains a clean and modular implementation of a Transformer...

10
Experimental
313 godhunter98/nano_transformers

From scratch implementation of a small transformers language model inspired...

10
Experimental