gupta-abhay/pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
This project helps machine learning engineers and researchers classify images more accurately and efficiently. It takes raw image data as input and produces highly accurate classifications by leveraging transformer architectures, which are typically used for text. This is ideal for those working on computer vision tasks who want to explore cutting-edge models.
306 stars. No commits in the last 6 months.
Use this if you are developing computer vision models and want to implement advanced Vision Transformer architectures for improved image classification.
Not ideal if you are looking for a plug-and-play solution without any coding, or if your primary focus is traditional convolutional neural networks.
Stars
306
Forks
36
Language
Python
License
MIT
Category
Last pushed
Oct 01, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/gupta-abhay/pytorch-vit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
jaehyunnn/ViTPose_pytorch
An unofficial implementation of ViTPose [Y. Xu et al., 2022]
UdbhavPrasad072300/Transformer-Implementations
Library - Vanilla, ViT, DeiT, BERT, GPT
tintn/vision-transformer-from-scratch
A Simplified PyTorch Implementation of Vision Transformer (ViT)
icon-lab/ResViT
Official Implementation of ResViT: Residual Vision Transformers for Multi-modal Medical Image Synthesis
NVlabs/GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text...