tintn/vision-transformer-from-scratch
A Simplified PyTorch Implementation of Vision Transformer (ViT)
This project provides a clear and straightforward example of how a Vision Transformer (ViT) model is constructed and trained for image classification. It takes image datasets as input and outputs a trained model capable of classifying images into predefined categories. This is ideal for machine learning researchers or students who want to understand the inner workings of ViT models.
241 stars. No commits in the last 6 months.
Use this if you are a machine learning student or researcher seeking a simple, commented codebase to learn the fundamental architecture and training process of a Vision Transformer for image classification.
Not ideal if you need a high-performance, production-ready image classification solution or if you are looking for advanced features beyond a basic implementation.
Stars
241
Forks
41
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jun 10, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tintn/vision-transformer-from-scratch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
jaehyunnn/ViTPose_pytorch
An unofficial implementation of ViTPose [Y. Xu et al., 2022]
UdbhavPrasad072300/Transformer-Implementations
Library - Vanilla, ViT, DeiT, BERT, GPT
icon-lab/ResViT
Official Implementation of ResViT: Residual Vision Transformers for Multi-modal Medical Image Synthesis
gupta-abhay/pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
NVlabs/GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text...