gupta-abhay/pytorch-vit

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

/ 100

Emerging

This project helps machine learning engineers and researchers classify images more accurately and efficiently. It takes raw image data as input and produces highly accurate classifications by leveraging transformer architectures, which are typically used for text. This is ideal for those working on computer vision tasks who want to explore cutting-edge models.

306 stars. No commits in the last 6 months.

Use this if you are developing computer vision models and want to implement advanced Vision Transformer architectures for improved image classification.

Not ideal if you are looking for a plug-and-play solution without any coding, or if your primary focus is traditional convolutional neural networks.

image-classification computer-vision deep-learning machine-learning-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

306

Forks

Language

Python

License

MIT

Compare

pytorch-vit and ViT_PyTorch

Higher-rated alternatives

jaehyunnn/ViTPose_pytorch

An unofficial implementation of ViTPose [Y. Xu et al., 2022]

UdbhavPrasad072300/Transformer-Implementations

Library - Vanilla, ViT, DeiT, BERT, GPT

tintn/vision-transformer-from-scratch

A Simplified PyTorch Implementation of Vision Transformer (ViT)

icon-lab/ResViT

Official Implementation of ResViT: Residual Vision Transformers for Multi-modal Medical Image Synthesis

NVlabs/GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text...

Explore Transformer Models

All categories Trending Transformer directory Insights