Breeze648/Transformer-from-Scratch

本仓库定位为 AI论文复现 / 从零实现 Transformer。代码遵循原论文的模块划分，包含位置编码、多头注意力、前馈网络、编码器‑解码器等全部组件，并附带详细的中文拆解文档与英文注释，方便学习与二次开发。

/ 100

Emerging

This project helps AI researchers and students understand the Transformer architecture by providing a clear, modular, and well-documented implementation of the original 'Attention Is All You Need' paper. It takes raw input sequences (like text or other sequential data) and processes them through the Transformer's encoder-decoder structure to produce output sequences, typically for tasks such as translation or text generation. This is ideal for those studying or working with neural machine translation and large language models.

No commits in the last 6 months.

Use this if you are an AI researcher or student who wants to deeply understand and potentially modify the foundational Transformer model, with a focus on its core components.

Not ideal if you need a high-level library for immediately deploying large-scale NLP applications without diving into the model's internal workings.

neural-machine-translation large-language-models deep-learning-research nlp-architecture ai-education

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 15 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in...

kyegomez/LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

pbloem/former

Simple transformer implementation from scratch in pytorch. (archival, latest version on codeberg)

NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

kyegomez/SimplifiedTransformers

SimplifiedTransformer simplifies transformer block without affecting training. Skip connections,...

Explore Transformer Models

All categories Trending Transformer directory Insights