demidovd98/sm-vit
Official repository for the paper "Salient Mask-Guided Vision Transformer for Fine-Grained Classification" (VISIGRAPP '23)
This project offers a powerful tool for automatically recognizing specific sub-categories of objects in images, even when they look very similar. You provide it with images (e.g., photos of different bird species or dog breeds), and it precisely identifies the exact type of bird or dog. This is especially useful for researchers, zoologists, or anyone needing to classify visually similar items with high accuracy.
No commits in the last 6 months.
Use this if you need to accurately differentiate between very similar-looking objects within a broader category, like distinguishing between different types of birds, car models, or dog breeds in images.
Not ideal if you only need general object recognition (e.g., just identifying 'a car' instead of 'a specific car model') or if your images contain objects that are highly dissimilar.
Stars
21
Forks
2
Language
Python
License
MIT
Category
Last pushed
Mar 06, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/demidovd98/sm-vit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BR-IDL/PaddleViT
:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
pathak22/unsupervised-video
[CVPR 2017] Unsupervised deep learning using unlabelled videos on the web
IBM/CrossViT
Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
NVlabs/GCVit
[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
ViTAE-Transformer/ViTDet
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object...