sanket-poojary-03/Fine-tuning-ViVit

Python script to fine tune Open source Video Vision Transformer (ViVit) using HuggingFace Trainer Library

/ 100

Experimental

This script helps machine learning engineers or researchers adapt a powerful video analysis model for their specific video classification needs. You provide a collection of videos with their corresponding categories, and the script fine-tunes the existing ViVit model to accurately assign new videos to one of your defined 10 classes. This is ideal for those working with video understanding tasks.

No commits in the last 6 months.

Use this if you need to customize an advanced video classification model to recognize specific actions, objects, or events within your own video datasets.

Not ideal if you don't have a dataset of labeled videos or if you need to classify videos into more than 10 categories without modifying the script.

video-classification machine-learning-engineering computer-vision custom-model-training

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

Jittor/jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

berniwal/swin-transformer-pytorch

Implementation of the Swin Transformer in PyTorch.

zhanghang1989/ResNeSt

ResNeSt: Split-Attention Networks

NVlabs/FasterViT

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with...

ViTAE-Transformer/ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose...

Explore ML Frameworks

All categories Trending ML Framework directory Insights