Transformer Architecture Education Transformer Models

There are 63 transformer architecture education models tracked. 1 score above 70 (verified tier). The highest-rated is huggingface/transformers at 87/100 with 157,811 stars. 1 of the top 10 are actively maintained.

Get all 63 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-education&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine...

87
Verified
2 kyegomez/LongNet

Implementation of plug in and play Attention from "LongNet: Scaling...

51
Established
3 pbloem/former

Simple transformer implementation from scratch in pytorch. (archival, latest...

49
Emerging
4 NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

48
Emerging
5 kyegomez/SimplifiedTransformers

SimplifiedTransformer simplifies transformer block without affecting...

47
Emerging
6 ARM-software/keyword-transformer

Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769

47
Emerging
7 ChangwenXu98/TransPolymer

Implementation of "TransPolymer: a Transformer-based language model for...

45
Emerging
8 IBM/regression-transformer

Regression Transformer (2023; Nature Machine Intelligence)

45
Emerging
9 bytedance/effective_transformer

Running BERT without Padding

44
Emerging
10 bayesgroup/code_transformers

Empirical Study of Transformers for Source Code & A Simple Approach for...

43
Emerging
11 ShivamRajSharma/Transformer-Architectures-From-Scratch

Implementation of transformers based architecture in PyTorch.

43
Emerging
12 dashstander/block-recurrent-transformer

Pytorch implementation of "Block Recurrent Transformers" (Hutchins & Schlag...

41
Emerging
13 Breeze648/Transformer-from-Scratch

本仓库定位为 AI论文复现 / 从零实现 Transformer。 ...

41
Emerging
14 octanove/shiba

Pytorch implementation and pre-trained Japanese model for CANINE, the...

41
Emerging
15 YadaYuki/transformer-from-scratch

Transformer from scratch 🙊 (English to Japanese Translator by PyTorch)

40
Emerging
16 Whiax/BERT-Transformer-Pytorch

Basic implementation of BERT and Transformer in Pytorch in one short python...

38
Emerging
17 pmichel31415/are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

38
Emerging
18 dcaffo98/transpormer

TranSPormer: a transformer for the Travelling Salesman Problem

38
Emerging
19 amazon-science/transformers-data-augmentation

Code associated with the "Data Augmentation using Pre-trained Transformer...

37
Emerging
20 THUDM/Multilingual-GLM

The multilingual variant of GLM, a general language model trained with...

35
Emerging
21 forgi86/sysid-transformers

Code to reproduce the results of the paper In-context learning for...

34
Emerging
22 nanowell/Differential-Transformer-PyTorch

PyTorch implementation of the Differential-Transformer architecture for...

34
Emerging
23 submarat/removing-layer-norm

Transformers Don’t Need LayerNorm at Inference Time

33
Emerging
24 chrisjob1021/transformer-examples

A collection of educational toy implementations and examples of key...

33
Emerging
25 shamspias/Transformers-and-Large-Language-Models-From-Basics-to-Frontier-Research

Dive into the transformative world of NLP with this guide on Transformers....

32
Emerging
26 IParraMartin/An-Explanation-Is-All-You-Need

The original transformer implementation from scratch. It contains...

31
Emerging
27 LoserCheems/WonderfulMatrices

Wonderful Matrices to Build Small Language Models

29
Experimental
28 fabienfrfr/tptt

😊 TPTT: Transforming Pretrained Transformers into Titans

29
Experimental
29 HSaurabh0919/CTransformers

Implementing wide variety of transformers, fine tuning as well as trying...

29
Experimental
30 kyegomez/MLXTransformer

Simple Implementation of a Transformer in the new framework MLX by Apple

27
Experimental
31 januverma/transformers-stuff

Codes, scripts, and notebooks on various aspects of transformer models.

27
Experimental
32 BruinGrowly/URI_Transformer

URI-Transformer: Universal Reality Interface - A revolutionary artificial...

26
Experimental
33 SauravP97/toy-transformer

A decoder only Transformer implementing masked attention

24
Experimental
34 abgache/NanoGPL

Small test generative pre-trained LAM (Linear Attention Mechanism).

24
Experimental
35 daniel-furman/polyglot-or-not

Are foundation LMs multilingual knowledge bases? (EMNLP 2023)

22
Experimental
36 TomasrRodrigues/TinyGPT

A research-grade PyTorch implementation of a decoder-only transformer from...

21
Experimental
37 kyegomez/HeptapodLM

An Implementation of an Transformer model that generates tokens non-linearly...

21
Experimental
38 MyDarapy/gpt-1-from-scratch

Rewriting and pretraining GPT-1 from scratch. Implementing Multihead...

20
Experimental
39 FareedKhan-dev/best-introduction-to-transformer

transformer again in the same manner as I did in my previous blog (for both...

20
Experimental
40 fattorib/tritonformer

Trainable transformer with fwd+bwd ops in Triton, matching the performance...

20
Experimental
41 Rohan-Thoma/Coding-attention-from-scratch

This repository consists code for executing attention mechanism from scratch...

18
Experimental
42 jongoiko/minigpt

Training a tiny GPT-like Transformer language model

18
Experimental
43 ashleysally00/transformers-and-attention

Detailed guide to Transformer models that includes both technical and...

17
Experimental
44 scttfrdmn/local-code-model

Pure Go implementation of a GPT-style transformer from scratch - educational...

17
Experimental
45 DataWorshipper/Machine_Translation

This repository implements a Machine Translation system from scratch using...

17
Experimental
46 ambideXtrous9/Transformer-from-Scratch

Transformer from Scratch

16
Experimental
47 tsvlgd/gpt-from-scratch

decoder-only Transformer (GPT) language model coded from scratch in pytorch

15
Experimental
48 GabMartino/TransformerForDummies

Annotated implementation of vanilla Transformers to guide through all the...

15
Experimental
49 Ultron09/Numpy-Transformer

A pure NumPy implementation of GPT built from scratch for educational...

15
Experimental
50 gatorduck/Creating_Custom_Decoder_Transformer

Custom decoder Transformer that treats a patient's medical journey like a...

14
Experimental
51 ZZZ150751/cs336_spring2025_assignment1

Implementation of a Decoder-only Transformer language model from scratch for...

14
Experimental
52 driellecristine/BERT-Contrastive-LoRA

Enhance BERT fine-tuning for intent classification using supervised...

14
Experimental
53 Harsha-hue/visual-transformer-guide

I built a visual guide explaining how Transformers work. Tokenization...

14
Experimental
54 tulasinnd/Transformer-Decoder-Evolution

This repository contains various decoder-only transformer versions built...

13
Experimental
55 wahabzh/transformer-from-scratch

🤖 Complete Transformer implementation from scratch using PyTorch. Trained on...

13
Experimental
56 ledesma-ivan/How-Transformer-LLMs-Work

Understand the architecture behind modern Large Language Models. This...

13
Experimental
57 sourize/Decodex

This project implements a decoder-only GPT model from scratch using PyTorch.

13
Experimental
58 Hunain0786/miniTransformer

Mini Transformer (Implemented From Scratch) A from-scratch implementation...

13
Experimental
59 xmarva/transformer-based-architectures

Breakdown of SoTA transformer-based architectures

11
Experimental
60 Pavansomisetty21/Attention-is-All-You-Need-The-Transformer-architecture

In this we explore detailed architecture of Transformer

11
Experimental
61 nabeelshan78/gpt-forge-from-scratch-transformer

A clean, modular implementation of a decoder-only Transformer (mini-GPT)...

11
Experimental
62 SrEntropy/nanoGPT-Transformer

Mastering every concept from the seminal 2017 paper "Attention Is All You...

10
Experimental
63 coxy1989/tfmr

Keras/Tensorflow implementation of the decoder from the transformer as...

10
Experimental