Transformer Architecture Education Transformer Models
There are 63 transformer architecture education models tracked. 1 score above 70 (verified tier). The highest-rated is huggingface/transformers at 87/100 with 157,811 stars. 1 of the top 10 are actively maintained.
Get all 63 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-education&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
huggingface/transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine... |
|
Verified |
| 2 |
kyegomez/LongNet
Implementation of plug in and play Attention from "LongNet: Scaling... |
|
Established |
| 3 |
pbloem/former
Simple transformer implementation from scratch in pytorch. (archival, latest... |
|
Emerging |
| 4 |
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT |
|
Emerging |
| 5 |
kyegomez/SimplifiedTransformers
SimplifiedTransformer simplifies transformer block without affecting... |
|
Emerging |
| 6 |
ARM-software/keyword-transformer
Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769 |
|
Emerging |
| 7 |
ChangwenXu98/TransPolymer
Implementation of "TransPolymer: a Transformer-based language model for... |
|
Emerging |
| 8 |
IBM/regression-transformer
Regression Transformer (2023; Nature Machine Intelligence) |
|
Emerging |
| 9 |
bytedance/effective_transformer
Running BERT without Padding |
|
Emerging |
| 10 |
bayesgroup/code_transformers
Empirical Study of Transformers for Source Code & A Simple Approach for... |
|
Emerging |
| 11 |
ShivamRajSharma/Transformer-Architectures-From-Scratch
Implementation of transformers based architecture in PyTorch. |
|
Emerging |
| 12 |
dashstander/block-recurrent-transformer
Pytorch implementation of "Block Recurrent Transformers" (Hutchins & Schlag... |
|
Emerging |
| 13 |
Breeze648/Transformer-from-Scratch
本仓库定位为 AI论文复现 / 从零实现 Transformer。 ... |
|
Emerging |
| 14 |
octanove/shiba
Pytorch implementation and pre-trained Japanese model for CANINE, the... |
|
Emerging |
| 15 |
YadaYuki/transformer-from-scratch
Transformer from scratch 🙊 (English to Japanese Translator by PyTorch) |
|
Emerging |
| 16 |
Whiax/BERT-Transformer-Pytorch
Basic implementation of BERT and Transformer in Pytorch in one short python... |
|
Emerging |
| 17 |
pmichel31415/are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?" |
|
Emerging |
| 18 |
dcaffo98/transpormer
TranSPormer: a transformer for the Travelling Salesman Problem |
|
Emerging |
| 19 |
amazon-science/transformers-data-augmentation
Code associated with the "Data Augmentation using Pre-trained Transformer... |
|
Emerging |
| 20 |
THUDM/Multilingual-GLM
The multilingual variant of GLM, a general language model trained with... |
|
Emerging |
| 21 |
forgi86/sysid-transformers
Code to reproduce the results of the paper In-context learning for... |
|
Emerging |
| 22 |
nanowell/Differential-Transformer-PyTorch
PyTorch implementation of the Differential-Transformer architecture for... |
|
Emerging |
| 23 |
submarat/removing-layer-norm
Transformers Don’t Need LayerNorm at Inference Time |
|
Emerging |
| 24 |
chrisjob1021/transformer-examples
A collection of educational toy implementations and examples of key... |
|
Emerging |
| 25 |
shamspias/Transformers-and-Large-Language-Models-From-Basics-to-Frontier-Research
Dive into the transformative world of NLP with this guide on Transformers.... |
|
Emerging |
| 26 |
IParraMartin/An-Explanation-Is-All-You-Need
The original transformer implementation from scratch. It contains... |
|
Emerging |
| 27 |
LoserCheems/WonderfulMatrices
Wonderful Matrices to Build Small Language Models |
|
Experimental |
| 28 |
fabienfrfr/tptt
😊 TPTT: Transforming Pretrained Transformers into Titans |
|
Experimental |
| 29 |
HSaurabh0919/CTransformers
Implementing wide variety of transformers, fine tuning as well as trying... |
|
Experimental |
| 30 |
kyegomez/MLXTransformer
Simple Implementation of a Transformer in the new framework MLX by Apple |
|
Experimental |
| 31 |
januverma/transformers-stuff
Codes, scripts, and notebooks on various aspects of transformer models. |
|
Experimental |
| 32 |
BruinGrowly/URI_Transformer
URI-Transformer: Universal Reality Interface - A revolutionary artificial... |
|
Experimental |
| 33 |
SauravP97/toy-transformer
A decoder only Transformer implementing masked attention |
|
Experimental |
| 34 |
abgache/NanoGPL
Small test generative pre-trained LAM (Linear Attention Mechanism). |
|
Experimental |
| 35 |
daniel-furman/polyglot-or-not
Are foundation LMs multilingual knowledge bases? (EMNLP 2023) |
|
Experimental |
| 36 |
TomasrRodrigues/TinyGPT
A research-grade PyTorch implementation of a decoder-only transformer from... |
|
Experimental |
| 37 |
kyegomez/HeptapodLM
An Implementation of an Transformer model that generates tokens non-linearly... |
|
Experimental |
| 38 |
MyDarapy/gpt-1-from-scratch
Rewriting and pretraining GPT-1 from scratch. Implementing Multihead... |
|
Experimental |
| 39 |
FareedKhan-dev/best-introduction-to-transformer
transformer again in the same manner as I did in my previous blog (for both... |
|
Experimental |
| 40 |
fattorib/tritonformer
Trainable transformer with fwd+bwd ops in Triton, matching the performance... |
|
Experimental |
| 41 |
Rohan-Thoma/Coding-attention-from-scratch
This repository consists code for executing attention mechanism from scratch... |
|
Experimental |
| 42 |
jongoiko/minigpt
Training a tiny GPT-like Transformer language model |
|
Experimental |
| 43 |
ashleysally00/transformers-and-attention
Detailed guide to Transformer models that includes both technical and... |
|
Experimental |
| 44 |
scttfrdmn/local-code-model
Pure Go implementation of a GPT-style transformer from scratch - educational... |
|
Experimental |
| 45 |
DataWorshipper/Machine_Translation
This repository implements a Machine Translation system from scratch using... |
|
Experimental |
| 46 |
ambideXtrous9/Transformer-from-Scratch
Transformer from Scratch |
|
Experimental |
| 47 |
tsvlgd/gpt-from-scratch
decoder-only Transformer (GPT) language model coded from scratch in pytorch |
|
Experimental |
| 48 |
GabMartino/TransformerForDummies
Annotated implementation of vanilla Transformers to guide through all the... |
|
Experimental |
| 49 |
Ultron09/Numpy-Transformer
A pure NumPy implementation of GPT built from scratch for educational... |
|
Experimental |
| 50 |
gatorduck/Creating_Custom_Decoder_Transformer
Custom decoder Transformer that treats a patient's medical journey like a... |
|
Experimental |
| 51 |
ZZZ150751/cs336_spring2025_assignment1
Implementation of a Decoder-only Transformer language model from scratch for... |
|
Experimental |
| 52 |
driellecristine/BERT-Contrastive-LoRA
Enhance BERT fine-tuning for intent classification using supervised... |
|
Experimental |
| 53 |
Harsha-hue/visual-transformer-guide
I built a visual guide explaining how Transformers work. Tokenization... |
|
Experimental |
| 54 |
tulasinnd/Transformer-Decoder-Evolution
This repository contains various decoder-only transformer versions built... |
|
Experimental |
| 55 |
wahabzh/transformer-from-scratch
🤖 Complete Transformer implementation from scratch using PyTorch. Trained on... |
|
Experimental |
| 56 |
ledesma-ivan/How-Transformer-LLMs-Work
Understand the architecture behind modern Large Language Models. This... |
|
Experimental |
| 57 |
sourize/Decodex
This project implements a decoder-only GPT model from scratch using PyTorch. |
|
Experimental |
| 58 |
Hunain0786/miniTransformer
Mini Transformer (Implemented From Scratch) A from-scratch implementation... |
|
Experimental |
| 59 |
xmarva/transformer-based-architectures
Breakdown of SoTA transformer-based architectures |
|
Experimental |
| 60 |
Pavansomisetty21/Attention-is-All-You-Need-The-Transformer-architecture
In this we explore detailed architecture of Transformer |
|
Experimental |
| 61 |
nabeelshan78/gpt-forge-from-scratch-transformer
A clean, modular implementation of a decoder-only Transformer (mini-GPT)... |
|
Experimental |
| 62 |
SrEntropy/nanoGPT-Transformer
Mastering every concept from the seminal 2017 paper "Attention Is All You... |
|
Experimental |
| 63 |
coxy1989/tfmr
Keras/Tensorflow implementation of the decoder from the transformer as... |
|
Experimental |