ViTAE-Transformer/ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

/ 100

Emerging

This project helps machine learning engineers and researchers benchmark and improve object detection and segmentation models. It takes pre-trained vision transformer models and image datasets as input to produce optimized models capable of identifying and outlining multiple objects within images. The primary users are those working on computer vision tasks who need state-of-the-art performance.

579 stars. No commits in the last 6 months.

Use this if you are developing or evaluating advanced computer vision systems and need to leverage powerful vision transformer backbones for robust object detection and segmentation.

Not ideal if you are looking for a simple, out-of-the-box solution for basic object recognition without extensive configuration or a deep understanding of model training.

computer-vision object-detection image-segmentation deep-learning-research model-benchmarking

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

579

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

BR-IDL/PaddleViT

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

pathak22/unsupervised-video

[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web

IBM/CrossViT

Official implementation of CrossViT. https://arxiv.org/abs/2103.14899

NVlabs/GCVit

[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers

bytedance/SPTSv2

The official implementation of SPTS v2: Single-Point Text Spotting

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights