kyegomez/MegaVIT
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
This project offers an open-source implementation of a very large vision transformer model that can classify images into 1000 categories. It takes a raw image as input and outputs a prediction of what the image contains. This is ideal for machine learning researchers and practitioners who need to leverage state-of-the-art image recognition capabilities for various computer vision tasks.
Use this if you are a machine learning researcher or engineer building or experimenting with large-scale image classification and recognition systems.
Not ideal if you are looking for a simple, out-of-the-box solution for common image tasks without deep technical understanding or access to significant computational resources.
Stars
32
Forks
1
Language
Python
License
MIT
Category
Last pushed
Feb 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kyegomez/MegaVIT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
pairlab/SlotFormer
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
ChristophReich1996/Swin-Transformer-V2
PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution"...
prismformore/Multi-Task-Transformer
Code of ICLR2023 paper "TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene...
DirtyHarryLYL/Transformer-in-Vision
Recent Transformer-based CV and related works.
uakarsh/latr
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal...