Vision Transformer Classification Transformer Models

Tools and models for image classification using transformer architectures (Vision Transformers, SigLIP, BEiT, etc.). Does NOT include general image captioning, vision-language retrieval, or multi-label classification frameworks without transformer-based implementations.

There are 22 vision transformer classification models tracked. The highest-rated is QData/C-Tran at 45/100 with 280 stars.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=vision-transformer-classification&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 QData/C-Tran

General Multi-label Image Classification with Transformers

45
Emerging
2 jesus3476/Fire-Detection-Siglip2

Fire-Detection-Siglip2 is an image classification vision-language encoder...

41
Emerging
3 pagraf/Seabed-Net

Quick start guide for Seabed-Net

37
Emerging
4 moharamfatema/graduation-project

Video vision transformers for hierarchical anomaly detection in video scenes.

35
Emerging
5 apollosoldier/Advanced-Classifier

The Advanced Classification Model is a deep learning-based approach for...

31
Emerging
6 PRITHIVSAKTHIUR/Fire-Detection-Siglip2

Fire-Detection-Siglip2 is an image classification vision-language encoder...

29
Experimental
7 mohsenMahmoodzadeh/image-and-text-classifier

Deep learning models(CNN, LSTM, BERT) for image and text classification task...

27
Experimental
8 kunjmehta/cross-modal-retrieval-food-ai

Course project for 198:536 at Rutgers University. The project is about...

22
Experimental
9 zaaachos/Thesis-Diagnostic-Captioning

B.Sc. Thesis Deep Learning & NLP research on Medical Image Captioning

20
Experimental
10 PRITHIVSAKTHIUR/Gym-Workout-Classifier-SigLIP2

Gym-Workout-Classifier-SigLIP2 is an image classification vision-language...

19
Experimental
11 AD-Archer/hugging-face-foodguesser

Food Category Classification - A Python tool that uses deep learning to...

18
Experimental
12 00200200/Video-Waste-Dumping-Detection---IWDD

International Contest on Illegal Waste Dumping Detection

18
Experimental
13 PRITHIVSAKTHIUR/Painting-126-DomainNet

Painting-126-DomainNet is an image classification vision-language encoder...

18
Experimental
14 PRITHIVSAKTHIUR/Traffic-Density-Classification

Traffic-Density-Classification is an image classification vision-language...

17
Experimental
15 amgawishx/dnn_vision_classifiers

End-to-end ML pipeline for trainning different DNN vision classifiers and...

17
Experimental
16 lawrenceokolo1/vit-faiss-product-recommendation

Production-grade visual product recommendation using ViT + FAISS on Amazon...

13
Experimental
17 daniel-furman/CV-feature-eng-experiments

Hugging Face models are all you need for “vanilla” image classification

11
Experimental
18 arnabd64/Transformers-Image-Classification

Fine Tune a Transformers based Image Classification on Google Colab model...

11
Experimental
19 RHasan97/Recipe-classifier

This model can classify 55 different types of food based on the food...

11
Experimental
20 zeeshanAhsan1/Image-Classification-Using-Vision-Transformers

Academic Project where Image Classification has been explored using CNN...

11
Experimental
21 karthek-git/gic

Efficient classification of Mobile Gallery Images

11
Experimental
22 warun1801/Transformer-based-Video-Classifier

A transformer based video classification model. It is being used for...

10
Experimental