shashankvkt/DoRA_ICLR24
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video""
This project helps machine learning engineers and researchers create powerful image recognition systems without needing massive, manually labeled image datasets like ImageNet. It takes long, unlabelled videos as input and produces image encoder models capable of recognizing objects and patterns. This is ideal for those working on computer vision tasks where collecting and labeling large image datasets is impractical.
No commits in the last 6 months.
Use this if you need to train a robust image recognition model but only have access to large amounts of unlabelled video footage, rather than pre-classified image datasets.
Not ideal if you already have a well-curated, labelled image dataset or if your primary goal is real-time object tracking rather than general image understanding.
Stars
95
Forks
12
Language
Python
License
—
Category
Last pushed
May 17, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/shashankvkt/DoRA_ICLR24"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Jittor/jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
berniwal/swin-transformer-pytorch
Implementation of the Swin Transformer in PyTorch.
zhanghang1989/ResNeSt
ResNeSt: Split-Attention Networks
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with...
ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose...