Haochen-Wang409/DropPos
[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
This project offers a way to train powerful image recognition systems more effectively. It takes raw image data and produces a highly capable vision model ready for tasks like classifying images, detecting objects, or segmenting images. This is for machine learning engineers or researchers working on computer vision applications who need robust, pre-trained models.
No commits in the last 6 months.
Use this if you need to pre-train Vision Transformers (ViTs) to improve their spatial reasoning and overall performance on various downstream computer vision tasks.
Not ideal if you are looking for a ready-to-use, off-the-shelf application for image analysis without needing to engage in model pre-training or fine-tuning.
Stars
62
Forks
4
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 30, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/Haochen-Wang409/DropPos"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BR-IDL/PaddleViT
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
pathak22/unsupervised-video
[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web
IBM/CrossViT
Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
NVlabs/GCVit
[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
ViTAE-Transformer/ViTDet
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object...