microsoft/CvT

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

/ 100

Established

This project helps machine learning engineers and researchers build highly accurate computer vision models more efficiently. It takes collections of labeled images and outputs a trained model capable of classifying new images with state-of-the-art performance, using fewer computational resources. The models generated are particularly effective for tasks like large-scale image classification and can be fine-tuned for various downstream vision applications.

602 stars. No commits in the last 6 months.

Use this if you need to train robust image classification models that offer excellent accuracy while being more efficient in terms of parameters and computational cost than traditional methods.

Not ideal if your primary goal is real-time inference on highly constrained edge devices where every millisecond and byte of memory is critical, as specialized smaller models might be more suitable.

image-classification computer-vision deep-learning-research model-training visual-recognition

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

602

Forks

127

Language

Python

License

MIT

Related frameworks

Jittor/jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

zhanghang1989/ResNeSt

ResNeSt: Split-Attention Networks

berniwal/swin-transformer-pytorch

Implementation of the Swin Transformer in PyTorch.

NVlabs/FasterViT

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with...

ViTAE-Transformer/ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose...

Explore ML Frameworks

All categories Trending ML Framework directory Insights