ViTAE-Transformer/ViTDet
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
This project helps machine learning engineers and researchers benchmark and improve object detection and segmentation models. It takes pre-trained vision transformer models and image datasets as input to produce optimized models capable of identifying and outlining multiple objects within images. The primary users are those working on computer vision tasks who need state-of-the-art performance.
579 stars. No commits in the last 6 months.
Use this if you are developing or evaluating advanced computer vision systems and need to leverage powerful vision transformer backbones for robust object detection and segmentation.
Not ideal if you are looking for a simple, out-of-the-box solution for basic object recognition without extensive configuration or a deep understanding of model training.
Stars
579
Forks
46
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 24, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/ViTAE-Transformer/ViTDet"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BR-IDL/PaddleViT
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
pathak22/unsupervised-video
[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web
IBM/CrossViT
Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
NVlabs/GCVit
[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
bytedance/SPTSv2
The official implementation of SPTS v2: Single-Point Text Spotting