sanket-poojary-03/Fine-tuning-ViVit
Python script to fine tune Open source Video Vision Transformer (ViVit) using HuggingFace Trainer Library
This script helps machine learning engineers or researchers adapt a powerful video analysis model for their specific video classification needs. You provide a collection of videos with their corresponding categories, and the script fine-tunes the existing ViVit model to accurately assign new videos to one of your defined 10 classes. This is ideal for those working with video understanding tasks.
No commits in the last 6 months.
Use this if you need to customize an advanced video classification model to recognize specific actions, objects, or events within your own video datasets.
Not ideal if you don't have a dataset of labeled videos or if you need to classify videos into more than 10 categories without modifying the script.
Stars
14
Forks
2
Language
Python
License
—
Category
Last pushed
Aug 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/sanket-poojary-03/Fine-tuning-ViVit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Jittor/jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
berniwal/swin-transformer-pytorch
Implementation of the Swin Transformer in PyTorch.
zhanghang1989/ResNeSt
ResNeSt: Split-Attention Networks
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with...
ViTAE-Transformer/ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose...